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THE PROBLEM OF THE GREATER MEAN 


By RaGcuu Ray BAHADUR AND HERBERT RoBBins! 
University of North Carolina 


1. Introduction and summary. Let 7, 72 be normal populations with means 
m,, ™. respectively and a common variance o, the parameter point 
w = (m,, mc) which characterizes the two populations being unknown, and 
let 2 be an arbitrary given set of possible points w. Random samples of fixed 
sizes 7, M2 are drawn from 7, m2 respectively, giving the combined sample 
point v = (21, Xie, °+* , Lin, 5 Vor, Lor, *** » Leng). For reasons which will be 
made clear later in connection with practical examples, any function f(v) such 
that 0 < f(v) < 1 is called a decision function, and for any such f(v) the risk 
function is defined to be 


(1) r(f|w) = max [m , m2] — MES | wo] — m1 — f | w] > 0, 


where E denotes the expectation operator. A decision function f(v) is said to be 
(a) uniformly better than f(v) if r(f|w) < r(f|w) for all w in Q, the strict in- 
equality holding for at least one w, (b) admissible if no decision function is 
uniformly better than f(v), and (c) minimaz if 


sup [r(f | w)] = inf sup [r(f | w)]. 

wed f we 
The “‘problem of the greater mean”’ is, for any given Q, to determine the mini- 
max decision functions, particularly those which are also admissible. Special 
interest attaches to the case in which there exists a unique minimax decision 
function f(v) (in the sense that if f(v) is any minimax decision function then 
f(v) = f(v) for almost every v in the sample space); such an f(v) is automatically 
admissible. 

The problem of the greater mean is, of course, a special problem in Wald’s 
general theory of statistical decision functions [1]. Our results will, however, be 
derived by very simple direct methods which make no use of Wald’s general 
theorems. 

We cite without proofs a few examples in order to show how strongly the 
solution of the problem of the greater mean depends on the structure of Q. In 
each case the minimax decision function is a function only of the two sample 
means %,, <2. 

(i) Let 2’ consist of the two points (a, b: ¢) and (b, a: ¢), with a < b. Then 

‘1 if 1%, — nek > (nm; — ne)(a + b)/2, 
(2) Fv) = 


0 otherwise, 


is the unique minimax decision function. 


1 This work was supported in part by the Office of Naval Research. 
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(ii) Let 2” consist of the two points (ec + h, ec: ¢) and (¢ — h,e: 6), with h > 0. 
Then 
; “ (lif z > c, 
(3) fe(v) = 4 
| ° 
\0 otherwise, 
is the unique minimax decision function. 
(iii) Let Q’” consist of the three points (3, —4:1), (3, 3:1), ( —%,—4:1), and 
let 2; = no = n. Then 


hae 40°" <4, 

(4) f**(v) = - 
0 otherwise, 

where \ is a certain definite constant, is the unique minimax decision function. 

The parameter spaces of two or three points specified in these examples are 
rather trivial, but in fact the corresponding decision functions (2), (8), (4) re- 
main the unique minimax solutions of the decision problem with respect to 
much more general parameter spaces. Thus, for example, it is clear that /*(v) 
will remain the unique minimax decision function with respect to any 2 which 
contains 2’ and is such that 

sup [r(f*|)] = sup [r(f* | o) 
weQ weQ/ 
Corresponding remarks apply to f2(v) and f**(v). 
When n; = nm, (2) reduces to 
(lif Zz > #, 
(5) fr) = 5 
0 otherwise. 
This decision function is of particular interest when both the means m, , mz are 
unknown. It will be shown that whether or not n; = nz, f'(v) is the unique 
minimax decision function under certain conditions on 2 which are likely to 
hold in practice, at least when both n,; and nz are sufficiently large (Theorem 3). 
Likewise, f2(v), which is the analogue of f°(v) when one of the means (mz) 
known exactly, is apt to be the unique minimax decision function in such cases, 
at least when 7, is sufficiently large (Theorem 4). These results on f°(v) and 
f°(v) form the main results of the present paper. 

So much by way of a general summary. We shall now give a practical il- 
lustration (another is given in Section 3) to show how the problem of the greater 
mean arises in applications. 

Suppose that a consumer requires a certain number of manufactured articles 
which can be supplied at the same cost by each of two sources ™ and 7. The 
quality of an article is measured by a numerical characteristic x, and it is known 
that in the product of 7; , x is normally distributed with mean m; and variance 

*, but the values of these parameters are unknown. The consumer has ob- 
tained a random sample of »; and nz articles from 7, and z»2 respectively, and 
has found the values of x to be (a1, %2,°** , Tiny 3 Mar, Te, **, ena) = = 
What is the best way of ordering a total of N articles from the two sources 
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The usual statistical theory, which confines itself to estimating the unknown 
parameters and to testing hypotheses of the form H)(m; = me), has at best an 
indirect bearing on the problem at hand. We therefore adopt Wald’s point of 
view and investigate the consequences of any given course of action. If the 
consumer orders fN articles from 7; and (1 — f)N from a2, where 0 < f < 1, 
then the expectation of the sum of the z-values in the articles he obtains will 
be N(mf + m(1 — f)). The maximum possible value of this quantity is N 
max [m,, m.], and the ‘‘loss” per article which he sustains may therefore be 
taken as 


W(w, f) = max [m,, m2] — mf — m(1 — f) > 0, 


where w = (m,, m2: a) is the true parameter point. 

The consumer wants to choose f so as to make W as small as possible. If 
he knew m to be greater, or to be less, than m,, then by choosing f = 1 or 0 
respectively he could make JV = 0. But since he does not know which m, is 
the greater he will presumably choose f as some function of the sample point v. 
Suppose, therefore, that a “decision function” f(v), such that 0 < f(v) < 1 but 
not necessarily taking on only the values 0 and 1, is defined for all points v in 
the sample space and that the consumer sets f = f(v).” In repeated applica- 
tions of this procedure, the “risk” or expected loss (a double expectation is in- 
volved: the expected loss for a given f and the expected value of f in using the 
decision function f(v)) per article is given by (1), and the consumer will try to 
find an f(v) which minimizes this risk. Since the value of the risk depends on w 
it is necessary to specify which values of w are to be regarded as possible in 
the given problem; let the set of all such w be denoted by ©. If the consumer 
agrees to adopt the “conservative” criterion of minimizing the maximum pos- 
sible risk, then the statistician’s problem is to find the minimax decision func- 
tions in the sense defined above. We have given the solutions of this problem 
for certain types of parameter spaces. The reader will observe that each of the 
minimax decision functions (2), (3), (4) was of the ‘“‘all or nothing” type, with 
values 0 and 1 only. (Whether this remains true for every 2 we do not know.) 
By using one of these decision functions in a given instance one arrives at either 
the best possible decision or the worst. The attitudes of doubt sometimes as- 
sociated with the non-rejection of the hypothesis Ho(m; = me) are therefore 


2 One might say that the consumer should choose f in the light of what he can infer from 
v about the m; . But this formulation as a problem in ordinary statistical inference (estima- 
tion and testing) is not relevant and may be misleading. For example, a plausible f(v), 
based on the idea that the problem is one of testing hypotheses, is as follows: ‘‘Perform the 
two-tailed t test of Ho(m, = m2) at the five per cent level. If Ho is rejected set f = 0 or 1 
according as 7; is less than or greater than #2. If Ho is not rejected set f = }.’’ Another 
f(v), based on the theory of estimation, according to which the 7; are the ‘‘best’’ estimates 
of the m; , is as follows: ‘Set f = 0 or 1 according as 7; is less than or greater than Z2 .”’ 
Actually, the latter procedure is, from the remarks above concerning (5), the ‘‘best’’ in 
a certain definite sense and under certain conditions, but this fact does not follow from the 
usual theory of estimation. 
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irrelevant to the problem of the greater mean in the examples cited. (Cf. foot- 
note 2; also Example 1 in Section 3.) 

The risk function (1) is but one of a general class FP of risk functions, to be 
defined in Section 2, which are associated with the problem of the greater mean. 
The most important members of 2? are (1) and 


(6) F(f|w) = P(incorrect decision using f(v) | w), 


where ““m, < m,” and ‘“‘m, > m,”’ are the two possible decisions. The risk fune- 
tion (6) is relevant to applications of a purely “scientific”? nature in which the 
statistician is asked merely to give his opinion as to which population has 
the greater mean. Although the problem of constructing a suitable decision 
function for (6) is akin in spirit to the problems considered in the now classical 
Neyman-Pearson theory of statistical tests, no satisfactory solutions seem to 
be available. It is easy to see, however, that (1) and (6) are quite similar. Of 
course, in the case of (1) a decision function f(v) may take on any value be- 
tween 0 and 1 inclusive, while for (6) we allow only functions which take on 
only the values 0 and 1, corresponding respectively to the decisions ‘‘m, < m,.” 
and “‘m, > m.”’. We then have for any such f(v), 


([PU(v) = lle) = Hifi] if my 


< M2, 

(6’) F(f jw) =  P(i(v) = O]w) = El —fiai if m2, > me, 
\9 if m, = me 

, 

and by comparison with (1) we see that r(f|w) = | m: — mz | F(f | w) for all w. 


Now, in the three examples (i), (ii), (iii) cited above the unique minimax decision 
functions happen to take on only the values 0 and 1, and | m; — me | is constant 
on each of the respective parameter sets. It follows that (2), (3), (4) are also 
the unique minimax decision functions relative to (6) and to 0’, 2”, Q’”” respec- 
tively. The remarks above following Example (iii) also remain valid for the 
risk function (6). 

We conclude this section with a remark on the methods of this paper. Any 
decision function relevant to (6) is equivalent to a test of the hypothesis Ho(m < 
m2) against the alternative H,(m, > me), the region {v:f(v) = 1} being the 
“critical region.”” Hence the Neyman-Pearson probability ratio method can be 
used to obtain the unique minimax decision function with respect to (6) and 
an 2 consisting of two (or more) points, and the result carries over to more 
general types of 2 in the manner already indicated. It turns out, however, that 
the dominant properties of the probability ratio tests are not confined to 
the class of tests alone, but extend to the class of all functions f(v) such that 
0 < f(v) < 1. This result (Theorem 1) enables us to solve the problem of the 
greater mean for the risk function (1) as well as for (6). The reader who is inter- 
ested in applications may turn to Section 3. 


2. Theorems. We require the following slight generalization of a well-known 
result of Neyman and Pearson [2]. 
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THEOREM 1. Let $(v), oi(v), de(v), --- , o-(v) be summable functions defined on 
a measure space E with points v and measure u, u(E) < ©, let o,-++, c, be 
arbitrary constants, and let A & E be such that 


r - 

|v e A implies o(v) > dX Cid, (v), 
(7) :, 

iv e E — A implies o(v) < do c,(v). 

\ l 
Set 

id = dd; = l, ee yA 

(8) [ o:du =a (i r), 


and let f(v) be any measurable function such that 
(9) 0O<f) <1 


and such that 


(10) [ i ty wt r) 
Then 

(11) [ 1% du < [ody 

PRooF. [ fodu = [ fodu+ [ _,# du 


< [ fo du + [ \f (= c8s) du by (9), (7), 


[ fod + > cof feed 
[ fo au D cx| ff foe ds ~- [, foc | 
- [ $6 du +> De | a: — f, fou | by (10), 


= [ 1 du + : C; if. (1 — fod | by (8), 


I 


-[ou-[a-nom+ [ a-A(z cas) " 
| out [ (1 — f) (= a 6) he 


< [ oud by (9), (7). 
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Note 1. If the condition 


(12) 2 {0:60 = . cose} = 0 


1 
holds, then in order that the equality hold in (11) it is necessary and sufficient that 
(13) fv) = xa(v) a.e. (u), 


where xa(v) ts the characteristic function of the set A, 
(1 ifveA, 
xa(v) = 4 : 
\o fue BE — A, 


Proor. The sufficiency is obvious. To prove the necessity we observe from the 
proof of Theorem 1 that for equality to hold in (11) it is necessary that 


AO) (6 (vy) - . cods(t)) = 0 a.e. (u) in H—A, 


and that 


(l— Fo)( oe) _ » eile) ) = 0 a.e. (u) in A, 
These relations and (12) imply (13). 
Norte 2. If relations (10) are replaced by 


(10’) i Sica = 4: (= 1, -+-,9), 
E 


and tf each of the constants c; is non-negative, then Theorem 1 and Note 1 remain 
valid. 

Theorem 1 has applications to a number of decision problems of a certain 
type. In the present paper we consider only the ‘“‘problem of the greater mean” 
for two normal populations with a common variance o’, where at least one of 
the means m,, mz is unknown. The following assumptions and definitions will 
be valid henceforth. 

(A) Ey is the N = m + “m dimensional sample space of points 
v = (a, %e,°°* , Lin, 3 Tr, To, °*** , Lon.) A measurable function f(v) de- 
fined for all v in Ey is a decision function if 0 < f(v) < 1. fi(v) = fo(v) means 
fi(v) = fo(v) for almost every v in Ey . 

(B) Q is a given set of points w = (m,, m2: c), ¢ > 0. Given w in Q, the prob- 
ability measure in Ey is that generated by the distribution function 


K(v | w) = III 


G [(xiz — m;)/cl, 
1 


i=l j= 


where 


G(z) = (2x)? [ ee ay, 





we 
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Given any function @ = ¢(v) for which the integral exists we write 
Ele | ol = i $(v) dK(v | w). 


EN 


(C) Let yw) = (91, g2) be a function defined for all w in Q, with values in 
E,, and such that 


(14) m; <m; implies g; < g; (¢,7 = 1, 2). 
Given p, 0 < p < 1, we define 
We, p) = max [gq , go] — gip — go(1 — p), 
and given a decision function f(v) we define the risk function 
; r(f|o) = E[W(e, f)|«] = We, Eff | «)) 
©) = max [9 , go] — mElf |] — gel — f| a). 
The class of risk functions (15) corresponding to all functions y(w) which satisfy 
(14) is denoted by R. (The two most important members of R are (1), with 
y(w) = (m,, me), 
and (6), with 
((0,1) if m <m, 


yw) =4(1,0) if m>m, 
((0,0) if m= m. 


The risk functions (1) and (6) appear in the examples in Section 3.) Throughout 
this section r(f | w) will denote a fixed but arbitrary member of R. We shall use 
the notations 


hw) = |g — gel, 
—} 
d(w) = (2 > 4) (m; — me)/e, 


Mm Ny 
Zz, = ni > x3; (i = 1, 2). 
j=l 

THEOREM 2. Let w, = (m,, m2: 0) and w. = (u1, ue : o) be two parameter points 

such that 
d(w,) < 0, d(w.) > 0, h(w)h(w.) > 0. 

For any \, —-© <A X< &, let fi(v) be the characteristic function of the set 
(16) Ay = {vini(ur — m)%, + neo(ue — me)% > Ao}. 


Then 
(i) Corresponding to any decision function f(v), there exists a X such that 


r(fxjor) = r(f | ow), r(fx | we) < r(f | we); 





476 RAGHU RAJ BAHADUR AND HERBERT ROBBINS 


the inequality is strict unless f(v) = fy(v). 
(ii) Given any d, tf f(v) is a decision function such that 


r(f | wi) < r(fr | @;) 
then 
f(ve) = far). 

(iii) There exists a unique c such that 
(17) r(fe | wi) = r(fe | we) = B say, 
and for any decision function f(v) we have 
(18) B < max [r(f | w:), rY | #2); 
the inequality is strict unless f(v) = f-(v). It follows that f-(v) 1s the unique minimax 
decision function corresponding to the two-point parameter space 2 = (wy; , we). 

Proor.’ (a) Let $(v), ¢:(v) be the joint frequency functions of the sample 
point v corresponding to the parameter points w: , w; respectively. It is readily 


seen that for any there exists a unique constant (A), 0 < (A) < &, such 
that 


Ay = {v:d(v) > adi(v)} 
(c(— ©) = 0,a,(«) = ©). Moreover, since w ¥ w, 
u{vido(v) = adi(v)} = 0. 


It follows from Theorem 1, Note 2, that if f(v) is any decision function such 
that 


Elf | oi] < Elf, | ol, 
then 
E\f | 2] < Elfy | wel, 


and the strict inequality holds unless f(v) = file). 
(b) It is clear from the definition (16) that for any fixed parameter point w 
the function 


Elf, | wo] = P(A, | &) 


is continuous and strictly decreasing from 1 to 0 as \ varies from — x to +, 
(c) For any decision function f(v) and any parameter point w we have by (C), 


r(f}w) = max |g, go] — mElf |e] — gel — f | ol. 
Hence 


(r(f | or) = A(w)ELS | ot], h(wy) > 0, 


Ir(f | we) = h(w)El1 — f | wl, h(w.) > 0. 


* Theorem 2 (as also Example (iii) of Section 1) could be derived from Wald’s general 
results on the completeness of the class of Bayes solutions of statistical decision problems. 
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Since for any decision function f(v), 0 < E[f | w:] < 1, we can by (b) choose \ 
so that 


(20) Effi |] = Elf | or], 
and by (a) it follows that unless f(v) = fi(v), 
(21) Elf, | w2] > ELf | 2]. 


(i). Follows from (19), (20) and (21). 
(ii). Follows from (19) and (a). 
(iii). (17) follows from (19) and (b). Then (18) follows from (17) and (ii). 


Theorem 2 provides the solution of any problem of the greater mean when Q 
consists of Just two points w; , w, . For, the problem is trivial unless d(w;) d(w.) < 
0 and h(w:)h(w2) > 0, and in the non-trivial case the unique minimax decision 
function is f-(v) defined by (17). Moreover, it follows at once from the defini- 
tion that if f(v) is the unique minimax decision function with respect to some 
parameter set ©, then it remains so with respect to any 2 such that 2 > © and 


sup [r(f|w)] = sup [r(f| #)]. 
weQ w¢«0 


By taking sets 2 which consist of two points, Theorem 2 can therefore be used 
to obtain sufficient conditions for an f(v) = f.(v) to be the unique minimax 
decision function with respect to a quite general 2. (It is clear that results 
analogous to Theorem 2(iii) but pertaining to more than two parameter points 
can be derived from Theorem 1, and that these results can be exploited in a 
similar way. An instance of this procedure where © consists of three points will 
be given at the end of this section.) 

The theorems which follow exploit Theorem 2 in this way to obtain conditions 
on 2 under which the decision functions f°(v) and f¢(v) defined by (5) and (3) 
are minimax. We consider f*(v) first. From (C) we have, after a simple compu- 
tation, 


(22) r(f° |) = h(w)-G(— | d@) |). 


THEOREM 3. Suppose that there exist sequences {wx}, {wx} of points w, = 
(mu. > Mo : ox), wy = (wrk » Mak : ox) an Q such that 


(i) lim r(f°| wx) = sup [r(f"|«)] (#0, @), 
k++ woeQ 

(ii) (we) = — d(wp), h(x) = h(ws), and nymy + nymx = Nur, + Nowe for 
every k = 1,2,--- 


Then f'(v) is an admissible minimax decision function. If there exist 
w) = (m, M2: 0), w = (M1, M2: 0) in Q satisfying (i) and (ii), then f°(v) is the 
unique minimax decision function. 

Proor. By (22) and (ii), 


(23) r(f° | wy) = r(f° | wr) for every k. 
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Without loss of generality, we may assume the two sequences to be so chosen 
that h(w,) = h(w,) > 0 for every k. Then, by interchanging corresponding 
members if necessary, we may assume that 


(24) d(w,) = — d(w,) < 0 for every k. 


Consider the two points w; , w, in 2 with arbitrary but fixed k. Writing wz , «, 
for w:, we respectively, and using conditions (ii), a simple calculation shows 
that the set defined by (16) is 


(25) Ay = {v:%, — % > L}, 


L being a strictly increasing function of X. 
Choose and fix an arbitrary decision function f(v) # f’(v). Comparing (5) and 
(25), it follows from Theorem 2(iii) and (23) that 


(26) r(f° | wn) = r(f| we) < max [r(f | wx), r(f | on). 
Clearly, f(v) cannot be uniformly better than f°(v) in 2. Again, from (26), 
(27) r(f° | wx) < sup [rf | »)], 


so that, since k is arbitrary, 


(28) suplr(s° | w)] = lim r(f°| we) < sup Ir(f | )]. 


Since f(v) # f'(v) in the preceding argument is arbitrary, we have shown that 
(a) no f(v) can be uniformly better than f°(v) and (b) sup [r(f°| )] = inf sup 
w tf o 


r(f | w)], ie. that f°(v) is admissible and minimax. The last part of the theorem 
follows upon setting w, = w in (27). This completes the proof of Theorem 3. 
The conditions on @ for f°(v) to be the unique minimax decision function may 
be written as follows: 
There exist w = (m,,mM2:0), wo = (1, ue 20) in Q such that 








(i) r(f° | wo)(=r(f° | wo)) = sup [r(f° | w)] (40, @), 
we 
(29) Gi) uw. = m + (= - ~ (m; —m2), we = mM +(™ . *) (m; — ma), 


(iii) A(wo) = h(wo). 


For the important risk functions (1) and (6), (29) (ii) implies (29) (iii) (i.e. h(w) 
depends on | m — mz, | alone). Moreover, when n; = nz , (29)(ii) becomes pi = 
M2 , Ke = m,. Thus for (1) and (6), when n; = nz the conditions (29) reduce simply 
to the condition that at least two points in Q at which the risk for f’(v) is maximum 
be image points of one another in the plane {w: m, = mz}. In particular, it follows 
that if ni = mn, and if the given set Q is “symmetric” in the sense that whenever 
(m: , M2 : 0) is in Q then (m2, m : a) is also in Q, then f’(v) is the unique minimax 
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decision function provided that it attains its maximum risk in Q, the risk function 
in question beging (1) or (6). There are obvious modifications (involving two 
sequences of points in 2) of these remarks which assert that f°(v) is at least an 
admissible minimax decision function in case f°(v) does not attain its maximum 
risk in Q. 

We shall now state the result analogous to Theorem 3 for the case when one 
of the means is known exactly, say m: = c. The decision function f?(v) is defined 
by (3). 

THEOREM 4. Suppose that there exist sequences {w,}, {wx} of points w. = (c + ar, 
Ct ox), o = (c — a , ¢: ox) tn QD such that 


(i) lim r(fe | ox) = sup [r(fe |]. (#0, ») 


(ii) h(wx) = h(wx) for every k = 1,2,-*°. 


Then f¢(v) is an admissible minimax decision function. If there exist w = (c + a, 
c:0), w) = (c — a,c:0) ind satisfying (i) and (ii), then f2(v) is the unique minimax 
decision function. 

The proof (based on Theorem 2(iii)) is similar to that of Theorem 3 and will 
be omitted. Note that for the risk functions (1) and (6), condition (ii) is auto- 
matically satisfied. 

The reader will have observed that results which may be obtained from 
Theorem 2(iii) in the manner of Theorems 3 and 4 will assert the optimal char- 
acter of decision functions which are characteristic functions of sets of the type 
{v: ai, + bi. > c}. The following example, cited as Example (iii) of Section 1, 
shows that for arbitrary 2 the optimum decision function need not be of this 
type. 

Suppose that n; = ns = n, that Q consists of the three points 


wo = (3, — $:1), a, = (3, $:1), we = (—3, —3:1), 


and that the risk function under consideration is given by (1) or (6). Then the 
unique minimax decision function is f**(v) given by (4), where \ > 0 is deter- 
mined by 


(30) El. — f** | wo] = Elf** | wil. 


The proof follows. f**(v) is the characteristic function of the set {v: ¢(v) > 
Cidi(v) + copo(v)}, where ¢, ¢1 , ¢: are the frequency functions of the probability 
distributions in £2, corresponding to the parameter points wo , w; , #2 respectively, 
with c; = c. = e”/X. Since for all X > 0, 


E{f** | o] = Elf** | we], 


and since a unique \ > 0 satisfying (30) certainly exists, it follows (ef. (19) and 


(C)) that 


r(f** | wo) = r(f** | on) = r(f** | we) = B, 
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say. Let f(v) be any decision function 4 f**(v). We shall show that 


(31) B < max [r(f | wo), r(f | 1), r(f | w2)]. 
Suppose not. Then 
r(f | oi) = Elf | oi] < Elf** | on) = r(f* | :), 
r(f | we) = Elf | w] < E[f** | w2] = r(f** | 2). 
Then, by Theorem 1, Note 2, we must have E[f | wo] < E[f** | wo], so that 
r(f | oo) = 1 — Eff | wo] > 1 — ELf** | wo] = r(f** | wo) = B, 


contrary to hypothesis. Hence (31) holds, and since f(v) # f**(v) is arbitrary 
our assertion is proved. (Note that 


r(f° | wo) = r(f° | or) = r(f" | we) 


also, so that f**(v) is uniformly better than f°(v) in &.) We remind the reader 

that f**(v) remains the unique minimax decision function with respect to (1) 

or (6) and any 2 which contains w» , w; , #2 , and is such that sup [r(f** | w)] = B. 
we 


Whether a set © satisfies the last condition will in general depend on whether the 
risk function in question is (1) or (6). 


3. Examples and discussion. In this section we shall discuss the relevance of 
Theorems 3 and 4 to two specific problems of the greater mean. The examples 
given are purely illustrative and the reader will readily construct others in which 
the statistician is faced with similar problems of decision. 

EXAMPLE 1. A farmer F has tested two varieties 7, 72 of grain in a field 
experiment in which 7; plots were assigned to 7; ,7 = 1, 2, all plots being of equal 
area. The plot yields obtained were yn, Yi, °° 5 Yin, ANd Yor, Yor, *** » Yore 
bushels respectively. F gives this data to a statistician S for analysis. F is willing 
to assume that the yields per plot for each of the two varieties are normally dis- 
tributed with unknown means y;, we and a common variance, also unknown. 
F says he is particularly interested in whether the two varieties are “‘significantly 
different.” 

S is well aware that F’s interest in the varieties is not purely scientific—that 
is to say, F did not perform the field experiment for the sole purpose of estimating 
the unknown parameters or testing hypotheses concerning them. S also knows 
that it is very unlikely that yu, is equal to pe. 

Suppose that in fact F wishes to decide which variety he should use next 
year on his land in order to make the maximum possible profit, and is afraid 
that if he were to act as if the observed mean yields 7; , 7. were the true popula- 
tion mean yields, he might make a gross error. So F is willing to compromise 
between the two varieties (that is, he will assign some fraction f of his land to 
a, and the rest to 72) in case S declares that there is no evidence of the two varie- 
ties being different. 
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If this is the case, S should ask F how much it costs him to use 2; and the 
price at which he expects to sell his grain. Supposing that these quantities are 
a; dollars per acre and b dollars per bushel respectively, and that the area of each 
plot in the field experiment was c acres, S will set 


m; = expected profit per acre in using variety 7; 
= (b/c)u; — a; dollars (¢ = 1, 2), 
w = (m,, m2 :c), 0 being the variance of the profit per acre 
in using 7, (¢ = 1,2), 
r y(w) = (m, , m2) (see Section 2, (C)), 
viz = (b/e)yi; — a3, F; = nz x Lij, v = (41, °°* , Lin, 3a, °°* 5 Lene), 


so that r(f | w) is given by (1) and is equal to the expected loss (in terms or profit 
) per acre) incurred by using the proportions f(v), 1 — f(v) of the varieties 7, , m2 
as compared with using the variety with the greater mean for the whole of the 
land. Then if S is satisfied that the set 2 of possible points w satisfies the condi- 
tions of Theorem 3 he should recommend that F use 7 alone if 7, > %, and 
m2 alone if Z; > Z,, this being the safest procedure in the sense that it is the 
minimax strategy (cf. Example 1 in [8}). 


We shall illustrate by a simple example the obvious method of verifying 

whether f’(v) is the minimax decision function for a given 2. We have by (22), 
using the risk function (1) obtained by setting y(w) = (m, me), 

(32) r(f°|w) = h(w)G(— | dw) | ) 

| 


-} 
= |m: — m1 G(—( 1 + +) | m, — m, | /c). 


[ay & 


Now suppose that 


a={ora-Smsa+h, 

(33) - 
; b—b<msbtbia-p<aK<ah, l>\|a-bl, 
3 


where a, b, l, oo , p(>0) are certain constants. By (32), the maximum risk occurs 
at some points in 2 for which o = oo. We have 


t 
3 
. (34) Hf |e =) = oo(2 +2) -teG(—2), 
r Ni Ne 
7 where 
9 


-4 
7 = x0) = (2 :) |m, — m, | /ao. 
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If a = band n, = n2 we see from the remark following (29) that °(v) is the unique 


minimax decision function. Suppose therefore that a ~ b or nm; ¥ ne or both. 
Now 


(35) sup [~G(—2x)] = rG(—2x) = .1700 (approx.), 
where 2 = .7518 (approx.). If m;, m2 were unrestricted, r(f° | ¢ = oo) would 


3 
be a maximum when | m — mz | = sute( + oo > , by (34) and (35). Hence f°(v) 


Ny 
will be the unique minimax decision function if these two lines intersect the square 
5 > 


{a _ : <m<sat+ 4 b — J <m << b+ } in such a way that at least two 


points lying on these lines and in the square satisfy (29)(ii). This will be the case if 





| — 
(36) 1 > max||a—b| + y,max(ja—b|,y) +|™ — wo, 


| M1 + Ne | ° 
1, 1\ 
Yo = Logo | — +—). 
Ny The 
< m: or Mm > Mm 


We have assumed that 1 > | a — b| , for otherwise either m, < 

for all w in Q, and there is no problem. It is therefore clear that for nm and n2 
sufficiently large, f’(v) will be the unique minimax decision function. That (36) 
is not a very strong requirement may be seen by setting a = b, m = 2ne, in 
which case (36) reduces to 


where 


1 ay 

L>ol—+— (approx.). 
nN Ne 

We remark that f’(v) remains the unique minimax decision function for any 

Nm, N2 “when! = ~” so that Q is given by 


(33’) Q= fw: —-2 <m < ©,— © < Mm < D:0)—p <a K< ao}. 


It is of interest to consider the “one sample”? case when one of the means is 
known, say m, = c. This will be the case (approximately) if 72 is a standard 
variety which has been in use for some time and 7; is a new variety. The analogue 
of the parameter space discussed above is then 


if l 


. a. : 4.4 | 
(37) Q= so:m=c,a—5Sm<atsio—pSesom, 5>la-—cl. 
\ < ~_ ~ 


° mn . ios 7 eas ‘ aa 
By using Theorem 4 it can be seen that f-(v) as defined by (3) is the unique mini- 
max decision function if ¢ = a or if c is not necessarily equal to a, but 


l 1\) 
(38) = — | = c | > ooto\— }, 


Ny 


{ 


where 2 is given by (35). Since the left-hand side of (38) is positive, it is clear 
0 ‘7 ‘ oi — . . ‘ ae 
that f-(v) will be the unique minimax decision function with respect to (37) if 
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m is sufficiently large. Note that f¢(v) is the unique minimax decision function 
for any nm; when! = o and Q@ is given by 


(37’) Q = {wim =c,—© <m < ©:09— pKa K< op}. 


The reader may find it instructive to consider other plausible sets 2 which 
satisfy the conditions of Theorems 3 and 4 and also some which do not, assuming 
o = 1 for simplicity. It should be observed that no matter what 2 may be, pro- 
vided only that o < o for all w in Q, we shall have by (32) and (35) 


3 
sup [r(f*|)] < 1700-64-( 4 + 2) (enpeex.). 


wy Ne 


In a similar way it can be seen that for any Q in which mz equals c and o < a» 
y 


3 
sup [r(fe | #)] < 1700-e»-( +) (approx.). 
wet n1 

EXAMPLE 2. 7 and 72 are two soporific drugs, the random variables generated 
by them being the duration of sleep induced by a standard dose in an individual 
chosen at random. It is assumed that these two populations are normal with 
unknown means m;, m2 and a common variance o’, also unknown. In a series 
of independent trials in which m individuals received the first drug and nz the 
second, the outcome was v = (an, Ti, *** , Lin, } Xa, Xo2,°** »Len.). The 
statistician S is required to say which is the more effective drug. 

Here a reasonable risk function is (6), where f(v) takes on only the values 
0, 1, corresponding to the decisions “m, < m2” and “m, > m2” respectively.‘ 
The problem of choosing f(v) so as to minimize this risk was considered by Simon 
[4]. He showed that in case n; = no, f°(v) is the uniformly best decision function 
in the class of symmetric decision functions. (Given nm; = nm, = n, a decision 
function f(v) is said to be symmetric if f(a, Ti, +++ 5 Lin 5 Vor» Lox, °° * » Len) = 
1 — f(aer, 2, °** 5 Von 3 Xu, Xie, °** , Lin). See also [3].) It is natural to confine 
oneself to the class of symmetric decision functions when the sample sizes are 
equal, but under the implicit assumption that if w = (a, b: c) isa possible param- 
eter point, then w’ = (b, a: c) is also (ef. the remarks following (29)). The 
illustrations in Section 1 show that if the sample sizes are unequal or if Q is not 
symmetric in the sense just described, there may exist decision functions which 
are uniformly better than f'(v): in (i) we have a “symmetric” @ but nm ¥ ne ; in 
(ili), 21 = 2 but Q is not “symmetric.” 

However, f'(v) is an admissible minimax decision function no matter what 
the sample sizes, provided only that Q satisfies a certain not too restrictive con- 
dition. We have 


(a(- | d(w) | ) for m ¥ m2, 
lo for m, = m2. 


(39) F(f?|w) = 


‘ For some purposes it would be more appropriate to take (1) as the risk function for this 
problem, letting the decision functions f(v) take on only the values 0 and 1. We have (essen- 
tially) discussed this case in the previous example. 
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It is clear that if {w,} is a sequence of points in 2 such that 


lim d(.) = 0, then lim #(J°| ex) = * = sup [F(f? | «)]. 

=e 0 ~ weQ 

Therefore, by Theorem 3, f’(v) is admissible and minimax if some point in the 
plane {w: m; = m2} is an interior point of the set 2 of possible parameter points 
(in fact it is sufficient if some plane ¢ = oo(>0) intersects Q in a set which 
has an interior point on the line m, = m2). Hence if nothing much is known 
about the two drugs, S could regard the foregoing as a justification for asserting 
“m, > m,” if # > and “m, < m,” otherwise. 

We have given no criterion for the choice of a suitable decision function when 
two or more admissible minimax decision functions exist, and our diffidence in 
recommending the use of f°(v) in the present case is due to the fact that under 
the condition stated above there will exist decision functions other than f°(v) 
which are also admissible and minimax with respect to (6). Let us suppose that 
Q is given by (33). Then f’(v) is admissible and minimax, by the preceding para- 
graph. However, it follows from Theorem 4 that each of 

; 1 if % > c, (0 if @ > ce, 
fe,(v) = and = fo*(v) = 3 


\0 otherwise, \1 otherwise, 
is also admissible and minimax, where ¢, and c2 are arbitrary constants with 
max [a, b] — ; <4, ¢2 < min f{a, b] + 5. 

There is, however, some reason for preferring f’(v) to other decision functions 
in the present case. S has been asked to give his opinion as to which is the better 
drug, and presumably no immediate consequences follow from the opinion which 
he might express. (This would not be the case if there were a sleepless individual 
on hand who had to be given a dose of one of the two drugs. Cf. footnote 4.) 
Although the problem 7s of a scientific nature, insistence upon literal exactitude 
in the interpretation of “‘incorrect decision” is meaningful only insofar as it is 
compatible with the physical situation. In view of the limited determinacy of 
unknown parameters in general, and of the limitations of experiments on soporific 
drugs in particular, it may be possible and even desirable to modify (6) in such 
a way that for any fixed o the risk tends to zero with | m,; — m, |. Thus modified, 
the risk function would be essentially similar to (1). A rather drastic way of 
introducing this modification would be to agree that the assertion of equality 
of the two means does not constitute an error in case | m; — m2| < ¢, where e is 
some positive constant. S will then take 

(F(f | w) if |m, — m| > «, 
(40) F(f|w) = § 
\0 otherwise, 


as the risk function. (Note that in using 7.(f | w) rather than 7(f |), S has in 
effect deleted the set {w: | m: — m.| < e} from the given set Q by defining y(w) = 
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(0, 0) there, instead of only when m; = mz as in the case of 7(f | w). Cf. “zones of 
indifference,” [5, pp. 27-30]). It follows from Theorem 3 that f°(v) is the unique 
minimax decision function with respect to (40) and (33) if a = b and nm, = nz 
and also if at least one of these conditions does not hold but 

rm —- NM | 
M1 + Ne re 


Thus f'(v) will be the unique minimax decision function no matter what n, , 
nm, , a, b or | may be, provided only that ¢ is sufficiently small. We shall leave 
other modifications of 7(f|w) and discussion of 7(f|w) with respect to other 
types of parameter spaces (e.g. (37)) to the reader. 

We conclude this discussion with a remark on the proper choice of m and nz 
in using f°(v) when the risk function belongs to the class R defined in Section 
2, (C). (The risk functions (1) and (6) belong to R.) Suppose that before experi- 
mentation starts, it is agreed that one must have n; + ne = 2k, where k is a 
fixed integer. In that case, choosing n; = n. = k will be the best choice of n; , 
ne in the following sense. (a) For any fixed w, r(f° | w), which is the expected loss, 
then becomes a minimum. This follows immediately from (22), since 


1 > max||a — | +6 max (la —6),0 + 








- 

r(f" |e) = h(w)G(—|d(w)|),  |d()| = (2 i +) |, — ma /s, 
and | d(w) | has its maximum when n; = n, = k. (b) For any fixed w, the variance 
of the loss also becomes a minimum. In using f(v), the loss takes the values 0 
and h(w) only, with P(loss = h(w)|w) = G(— |d(w)|) = a say. Therefore, 
the variance of the loss is h’a(1 — a). Since a < }, this expression increases with 
increasing a, and so has its minimum when n; = nz = k. This remark is, of course, 
without prejudice to the question of whether f°(v) is admissible and minimax with 
respect to a given Q for every mn; and ne with ny + nz, = 2k. 


4. A remark on randomized decision functions. In the foregoing discussion 
we have confined attention to the class of non-randomized decision functions: 
the space of possible decisions being some subset of 0 < f < 1, the statistician 
constructs (in advance) a suitable decision function f(v), obtains a particular 
sample point v by sampling the two populations, and takes f(v) as his decision. 
It is, however, of some theoretical interest to consider more general formulations 
in which the decision arrived at by the statistician may be a random function 
of the sample point v. 

A randomized decision function can be defined in several ways. One definition 
is as follows. Let ¢(z | v) be a function defined for all v in Ey and all real z such 
that for any fixed z it is a measurable function of v, and such that for any fixed 
v it is the distribution function of a random variable with values in 0 < z < 1. 
We shall denote this random variable by Z,(v) and call it a (randomized) decision 
function. In using it, the statistician first obtains a particular point v by sampling 
the two populations, then performs a random experiment whose outcome Z 
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has the known distribution function P(Z < z) = $(z|v), and takes Z as his 
decision. The class of all decision functions corresponding to all functions ¢(z | v) 
will be denoted by {Z,(v)}. It is clear that this class includes the class of non- 
randomized decision functions. 

This definition of the structure of randomized decision functions follows the 
method described by Halmos and Savage in their interesting remarks ((6], pp. 
239-241) on the value of sufficient statistics in statistical methodology. For 
any Z,(v), we have 















P(Zs(v) < z| w) P(Z,(v) < z| w, v) dK(v | w) 


EN 


(41) 
o(z | v) dK(v | w). 


EN 





We shall now show that in all problems of the greater mean in which the 
methods of Section 2 can be applied to non-randomized decision functions, ran- 
domization cannot be recommended. More precisely, the following holds. 

Tueorem. Let f(v) be a non-randomized decision function which takes on only 
the values 0 and 1 and which is the unique non-randomized decision function whose 
expected value E|f | w] satisfies a certain condition Q as a function of w. Then f(v) 
is the unique decision function whose expected value satisfies the condition Q; i.e. if 
Zs(v) is a decision function such that E|Z, | w] satisfies Q, then 


(42) P(f(v) = Zg(v) |w) =1 forall. 


It follows in particular that Theorem 2 remains valid with the arbitrary non-random- 
ized f(v) replaced by an arbitrary Z,(v), and in consequence, Theorems 3 and 4 
remain valid when the class of decision functions in question is {Z,(v)}. 

Proor. Let Z,(v) be a decision function whose expected value satisfies the 
condition Q. Now, by (41) and Theorem 5 of [7] we have 













(43) EiZ.\el= | fo) aK@|) = Elf lal, 
EN 

where 
1 

(44) fle) = [ zd.o(z|v), O<f*e) <1. 
0 


It is clear from (43) that E[f? | w] satisfies Q and so we must have 
(45) f?(v) = flv) ae. 


by hypothesis. Since f(v) takes on only the values 0 and 1, it follows from (44) 
and (45) that 





[ _ d.d(z|v) = lae., 
“{z=f(v)} 
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which implies (42). In order to verify the last part of the remark, consider any 


particular problem of the greater mean. The risk function of any decision func- 
tion Z(v) is, by (15), 


r(Zy |) = Wa, E[Z, | «)). 


Hence a condition on the risk function of Zs is equivalent to a condition on 
E[Z, | w] as a function of w, and the truth of the remark follows by appropriate 
definition of the condition Q in terms of the risk function. 
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ANALYSIS OF EXTREME VALUES 
By W. J. Dixon! 


University of Oregon 


1. Introduction. It is well recognized by those who collect or analyze data 
that values occur in a sample of n observations which are so far removed from 
the remaining values that the analyst is not willing to believe that these values 
have come from the same population. Many times values occur which are ‘“du- 
bious” in the eyes of the analyst and he feels that he should make a decision as 
to whether to accept or reject these values as part of his sample. On the other 
hand he may not be looking for an error, but may wish to recognize a situation 
when an occasional observation occurs which is from a different population. 
He may wish to discover whether a significant analysis of variance indicates an 
extreme value significantly different from the remainder. Also, of course, the 
extreme value may differ significantly without causing a significant analysis 
of variance and he may wish to discover this. It is reasonable to suppose that a 
criterion for rejecting observations would be useful he .2 also. The choice of a 
suitable criterion for rejecting observations introduces a number of questions. 

1. Should any observations be removed if we wish a representative sample in- 
cluding whatever contamination arises naturally? In other words, it may be 
desirable to describe the population including all observations, for only in that 
way do we describe what is actually happening. 

2. If the analyst wishes to sample the population unaffected by contamination 
he must either remove the contaminating items or employ statistical procedures 
which reduce to a minimum the effect of the contamination on the estimates of 
the population. That is, he may wish to describe only 95% of his population 
if the description is altered radically by the remaining 5% of the observations. 
He may have external reasons which are good and sufficient for wishing to de- 
scribe only 95% of his observations. Suppose he wishes to use the:sample for a 
statistical inference; the inclusion of all the data may sufficiently violate the 
assumptions underlying the inference to exclude the possibility of making a valid 
inference. 

This paper will concern itself only with those problems which arise from Ques- 
tion 2. . 

If we wish to follow some procedure which attempts to remove contamination 
we must consider the performance of any proposed criterion with respect to the 
proportion of contamination the criterion will discover and, of course, the propor- 
tion of the “good” observations which are removed by the use of the criterion. 
But, perhaps more important, we must consider what sort of bias will result 
when the standard statistical procedures are applied to samples of observations 
which have been processed in this manner. 


1 This paper was prepared under a contract with the Office of Naval Research. 
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If we wish to follow a procedure which will not search for particular values to 
be excluded but will minimize their effect if present, we must investigate the 
sampling distributions of these modified statistics and estimate the loss in in- 
formation resulting from their use when all observations are ‘‘good.’’ We must 
also investigate the expected bias which will result when ‘‘bad”’ items are present 
even though essentially excluded. Perhaps most disturbing about the avoidance 
of “bad” items is the fact that a decision must still be made as to whether a 
“bad” item was present or not in order to know in which way our estimates may 
be biased. For example, a sample mean computed by avoiding the two end ob- 
servations will not be a biased estimate of the mean of a symmetric population 
if both end items should actually be included or if both end items should not be 


included. However, if only one of the two should not be included this estimate of 
the mean will be biased. 


2. Models of contamination. The performance of the various criteria for dis- 
covery of one or more contaminators will be measured with reference to con- 
taminations of the following two types entering into samples of observations 
from a normal population with mean yz and variance o°, N(u, o°) 


A. One or more observations from N(u + do, 0°), 
B. One or more observations from N(, do’). 


A represents the occurrence of an “error” in mean value such as will occur in 
dial readings when errors are made in reading incorrectly digits other than the 
last one or two digits. Errors of this sort may result from momentary shifts in 
line voltage or from the inclusion among a group of objects of one or two items 
of completely different origin. This type of contamination will be referred to as 
“location error.” B represents the occurrence of an “error” from a population 
with the same mean but with a greater variance than the remainder of the sample. 
This type of error will be referred to as a “scalar error.” It is likely that many 
errors could be better described as a combination of A and B, but a study of these 
two errors separately should throw considerable light on the question of ‘gross 
errors” or “blunders.” 

Many authors have written on the subject of the rejection of outlying observa- 
tions. Apparently none have been successful in obtaining a general solution to 
the problem. Nor has there been success in the development of a criterion for 
discovery of outliers by means of a general statistical theory; e.g., maximum 
likelihood. A large number of criteria have been advanced on more or less intui- 
tive grounds as appropriate criteria for this purpose. In no case was investigation 
made of the performance of these criteria except for a few illustrative examples. 

References for the criteria discussed in the next section are given at the end 


of this paper. Indications are given as to the significance values available in 
those papers. 
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3. Criteria to be considered. The performance of two types of criteria has 
been investigated for samples contaminated with location or scalar errors. 


a) o known or estimated independently, 


b) o unknown. 


The n observations are ordered 2; < a2 < +--+ < 2,. The criteria involving 
external knowledge of o are: 


A. x’ test, 
2 r(x = )’ 
iaalieeee geet 
o 


B. Extreme deviation, 


Zn — £ I-72 
B, = = or rh, 
o o 











B = Ln — Ln-1 (or m= *) 
Co oC 
C. Range, 
j=". W=%,—-X, 
oC 
) > == 2 
C, = <, 2 = _—- (s independently estimated). 


The criteria involving only the information of a single sample of n observations 
are: 


D. Modified F test. 
1. For single outlier 2, , 


2 n n 
D, = a where S = >> (« — z,)? % = do 2/(n — 1), 
is” 2 2 


HR 
to 
ll 
—) 
—_ 
8 
& 
wa 
_& 
Si 
ll 
~M> 
&® 
bes. 
= 


(or for 2n, D; = se). 


2. For double outliers 2; , x2 , 





2 n % 
Si o2 = 2 y : 
Ds» = 2 y where 91,2 _— Zz (x tit X1,2)°, “2 = x/(n—2) 
2 3 3 
2 
n,n—1 
(or for %,, Xn-1, Dz = 2 ?. 


E. Ratios of ranges and subranges. 


1. For single outlier 2 , 
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ta Ty 
T1 =S—lU 
Ln = V1 
Lan — Lai 
¢ for a, 70 = ———— }. 
Ln — V1 
2. For single outlier x; avoiding 2, , 
ta — 11 
= 
La-1 — 2D 


~ In — Tr— 
( for rt, avoiding 2, 7 = ————— }. 
In — Xe 


3. For single outlier x; , avoiding 2, , Xn-1, 


Le — Ty 
Tn-2 — TX 


rr = 


” In — 2 
or for z, avoiding 2, %2, f12 = ————— }. 
Ian — 3 


4. For outlier x; avoiding zp , 


_ %% — XY 


7a = 
In — % | 


se Xn == Ln—2 1 
or for x, avoiding 2,1, To = ————— }. 
Xn sia i v1 


5. For outlier x; avoiding zx, and 2, , 
wz — 
721 = as 
Ln-1 — 


° ee - In — Ln-2 
or for 2, avoiding 2a-1, %1, f = ———— }. 
Ln ian Lo 


6. For outlier x; avoiding x. and 2, , Xn-1, 





+4: Un — Ln-2 
or for x, avoiding 2,1 , % , 22, %2 = —— : 
In — 2&3 
I’, Extreme deviation and standard deviation. 
For single outlier 2, , 


In — £ Z— 2% 
Pose ——— (or for a, 7 = 2=**). 
s s 


The performance of the large number of criteria listed here will be assessed 
with respect to discovery of contamination of the type given in Section 2. 
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4. Performance of criteria (estimate of o available). The x? test will of course 
give an indication of a large dispersion and since the extreme values are chief 
contributors to the sum of squares, it is possible to use this test as a criterion for 
rejecting a value or values which are at the greatest distance from the mean. 
It might be supposed the B, and B, would give better results since particular 
attention is paid to the end item. The same argument would influence one in 
favor of C; or C, . The performance of C2 can, of course, be expected to vary with 
the degrees of freedom in the independent estimate of co. For this study the de- 
grees of freedom for this estimate were held to the single value 9 df. 

x’ may be used since if the value of x’ is too large (greater than some upper per- 
centage point for x’) we might reject the value most distant from the mean. 
x tables may be used for percentage points. Percentage points for the other 
statistics considered here are given in the references at the end of this paper. 

The criteria A, B,, B,, C, , C2 were investigated for a = 1%, 5% and 10% 
for \ = 2, 3, 5, 7, where one or more items are selected from a population N(u + 
\o, o) and the remainder from N(u, o”). Investigations were also made for one 
item from N(u, d*o") for \ = 2, 4, 8, 12. The investigation was carried out by 
sampling methods. The performances of different criteria were assessed for the 
same group of samples in order to obtain more precision in the comparison of the 
different tests. All of the points appearing on the graphs in the subsequent sec- 
tions of this paper were based on from 66 to 200 determinations. 

The performance of the above criteria is measured by computing the propor- 
tion of the time the contaminating distribution provides an extreme value and 
the test discovers this value. Of course, performance could be measured by the 
proportion of the time the test gives a significant value when a member of the 
contaminating population is present in the sample, even though not at an ex- 
treme. However, since it is assumed that discovery of an outlier will frequently 
be followed by the rejection of an extreme we shall consider discovery a success 
only when the extreme value is from the contaminating distribution. 

The performance was judged by applying the criteria to each sample, always 
suspecting an outlier in the direction of the shifted mean for location error. 
Since the location errors were inserted by adding a fixed value to one or more 
of the observations, the largest value was tested as an outlier. The measure of 
performance was the percentage of location errors identified. When the location 
error was not an outlier, no test was performed and a failure for the test recorded. 

In the case of the model of contamination involving the scalar error, the value 
was suspected which was farthest from the mean. This of course, alters somewhat 
the level of significance, but this procedure was followed alike for all criteria 
investigated. The performance was measured in the same fashion as for location 
errors. 

Considering first, location errors, a study of the performance curves showing 
the per cent discovery of contaminators plotted against \ (the number of standard 
deviation units the population of contaminators is removed from the remainder), 
shows that the level of performance for o known is considerably above the level 
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of perforraance when ¢ is not known. The difference is greater for n = 5 than 
for n = 15 and, of course, the difference will diminish as the sample size increases. 
Figure 1 shows the performance curves for a = 5% (5% significance level for 
the test for an outlier) of B} = (x, — %)/o for n = 5 and n = 15 and of ry = 
tm = Fel for n = 5 andn = 15. 
In — U1 

The graphs for a = 1% and 10% would be similar in appearance. Figure 2 
indicates the change in performance for a = 1%, 5%, and 10%. The curves 
plotted are for B; = (x, — £)/o. The curves for A, Bz , C; , C2 show very similar 
results. 

The curve for test B, was used in Figures 1 and 2 since it gives the best per- 
formance of all criteria which are considered here if a single location error is 
present. The curves showing the comparative performance of these criteria as 


2 








Fic. 1. Improvement in performance ob- Fic. 2. The effect of the level of signifi- 
tained with knowledge of ¢, a = 5%,n = 5, cance on the performance of B; ; a = 1%, 
15. 5%, 10%; n = 5, 15. 


well as one to be considered later (rio) are given in Figure 3 for a = 5% and for 
n = 5andn = 15. 

The following statements can be made from inspection of Figure 3: 

a) The differences among A, B, , Bz, and C;, are not great. 

b) The knowledge of ¢ is less important in larger samples. 

c) The curve for C2 lies above that of rio for m = 5 and below that of rio for 

n = 15. This is consistent with the use of 9 df. in the independent estimate 
of o. 

If the question of ease in computation or application is important, it may be 
desirable to use B, or C; in place of B, for they are slightly easier to compute 
and it is not necessary to measure all observations to obtain the value of these 
statistics. From Figure 3 it will be noted that the performances of these criteria 
are nearly as good as for B, . If two outliers may be expected in a single sample, 
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Fic. 3. Comparison of the performance of criteria using ¢ known (or using external 
estimates of o) and rio for samples of size 5 and 15, a = 5%. 


the performance of B, will be lowered and the performance of B; and C; will be 
improved. Any differences between the performance of B, and the performance 
of C, when two outliers are present was not discernable for n = 5 or 15. Figure 4 
illustrates the improvement in performance for B, for a = 5% and n = 15. 

The performance curves of these criteria if a scalar error is present are very 
similar to those above except that: 

1. A high level of performance is approached very slowly. For example, see 
Figure 5 showing the performance of B, and ry» forn = 5andn = l5anda = 5%, 

2. There is a smaller difference in the performance between the criteria with 
o known and o unknown (see Figure 5). 

The performance of B, and C;, are noticeably increased by the introduction 
of more contaminators while that of B, decreases. No difference in the perform- 
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Fic. 4. Comparison of the performance of B, for one and two location errors in samples 
of size 15,a = 5%. 
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ance of B, and C,; were noted for either n = 5 or n = 15. Figure 6 shows the in- 
crease in performance of two contaminators for B, for n = 15, a = 5%. 

The general recommendations for possibilities of either type of contamina- 
tion, location or scalar errors, would lead one to the use of B, or C; if o is known. 

Criterion C, is recommended since: 

1. Its performance is almost as good as the performance of B, for a single 
outlier. Their performances are about equal for two outliers and C, affords pro- 
tection for outliers either above or below the mean. 

2. It is simple to compute. 

If ease of computation is not essential and maximum performance is desired, 
the criterion B, should be used. The performance of C2 will approach that of 
B, as the number of degrees of freedom in the denominator increases. 


O a ro 
OWE ERROR 1— 
TWO ERRORS —-— 














Fie. 5. Comparison of the performance of Fic. 6. Comparison of the perfo: mance 
B, and rj for one scalar error for samples of B, for one and two scalar errors in samples of 
size 5 and 15, a = 5%. size 15, a = 5%. 


5. Performance of criteria (no external estimate of oc). Criteria D,; and D, 
have strong intuitive reasons for their use since the dispersion is estimated by 
s’. The r ratios are attractive because of their simplicity and their preoccupation 
with the extreme values. Test F is the ‘‘studentized”’ ratio corresponding to B, , 
and is equivalent to D; since D; = 1 — F’/(n — 1). There is no apparent dif- 
ference in the performance of D,; and ry when one outlier is present and no 
apparent difference in D2. and ra when two outliers are present. This is true for 
both models of contamination and for the three levels of significance investigated. 
However the comparison of D, and ra was made only for n = 5 since critical 
values are not available? for D. for n = 15. (Critical values are available for 
n < 12.) 

The performance of D,; and rj under the two models of contamination can 
be obtained by reference to the curve for 73 in Figure 1 and Figure 5. The curve 
for D, is practically identical with the curve for rp . 


2 After this paper was submitted, the critical values of D, have been extended to n < 20 
(see references). 
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There is no question that ri is simpler to use, so that if this condition of 
contamination (scalar errors) exists, 71) would probably be chosen. However, as 
before, we should investigate what happens when more than one error is present. 
D, is designed for this case as is ro . Since the performance of these two criteria 
is approximately the same, r2) would probably be chosen because of its simplicity. 
Critical values for this statistic are available for n < 30. 

Ti, Ti2, 720, T21, T22 Were designed for use in situations where additional out- 
liers may occur and we wish to minimize the effect of these outliers on the in- 
vestigation of the particular value being tested. 

It has been suggested that D, could be used repeatedly to remove more than 
one outlier from a sample. This procedure cannot be recommended since the 
presence of additional outliers handicaps the performance of both D, and ry 
for small sample sizes and therefore the process of rejection might never get 
started. For larger sample sizes the performance of D, is affected much less by 
the presence of two errors than is the performance of rj . The repetitive use of 
D, is not recommended in this case either since ry) performs in a superior man- 
ner to D, in such situations. This difference in performance of D, and ry de- 
pends markedly on the level of significance used as well as the sample size. 
For small samples there is little difference in performance for any of the levels 
of significance one might use. For the larger sample sizes there is no appreciable 
difference for very high levels of significance. The difference is however very 
great for lower levels of significance. In fact as \ increases for two errors of the 
location type, the level of significance which divides the region of approach to 
zero performance from the region of approach to perfect performance of D, is 


given by the level of significance corresponding to a significance value of x =) 


; : lf n 15 
. 1 : ‘ ie ae oes ae 
for D,. Thus, for example, in samples of size 15, 25) = 036. 


This value lies between the values for the 2.5% and 5% level of significance. 
These values are .503 and .556 respectively. Therefore the use of the 1% or 
2.5% levels will give poorer and poorer performance as \ increases, and the 
use of the 5% or 10% levels will give better and better performance as A increases 
when two errors are present. The dividing point is such that for samples of 
size 11 or less the use of any of the given levels of significance will cause the 
performance to decrease as \ increases. For samples of size n < 14 the 1%, 
2.5% and 5% levels have the same effect, and for samples of size n < 16 the 1% 
and 2.5%, for samples of size n < 19 just the 1% level. For three such errors 


2 


the limit approached by D, as 2 increases is (-* i) Therefore, the perform- 


3 
ance of D, will approach zero for all levels of significance and for all sample 
sizes for which critical values are known except the 10% level of significance 
k-1 nn 
ko on—1 
for k contaminations present can be obtained by considering these k values to 


for sample sizes larger than 21. An indication of these limiting values 
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Fic. 7. Comparison of the performance of Fia. 8. Comparison of the performance of 
the r criteria for one location error in ther criteria for one scalar error in samples 
samples of size 5,a = 5%. of size 5, a = 5%. 


be at a distance k from the population mean, computing D, and allowing A to 
increase indefinitely. 

The comparative performance of the r criteria, a = 5%, in samples of size 5 
for the two models of contamination (one contaminator present) are given in 
Figures 7 and 8. For samples of size 15 the curves are given in Figures 9 and 10. 
A single curve suffices here since there is no discernable difference in the curves 
for the different r criteria. There is considerable difference in the performance 
curves if more than one outlier is present. However, the performances of rio , 
Tu, T12 are essentially the same when two location outliers are present as are 
the performances of ro, 721, 722. Figures 11 and 12 show the comparative per- 
formance of rio, 71, 712 for one and two contaminators for a = 5% and n = 5. 
Figures 13 and 14 are for n = 15. Figures 15 and 16 show the comparative per- 
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Fic. 9. Performance of the r criteria for Fic. 10. Performance of the r criteria for 
one location error in samples of size 15, a = onescealar error in samples of size 15, a = 5%. 


5%. 
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Fic. 11. Comparison of the performance Fic. 12. Comparison of the performance 
of the 7,. criteria for one and two location of the 7. criteria for one and two scalar 


FO7 


errors in samples of size 5,a = 5%. errors in samples of size 5, a = 5%. 


formance for re , 721 , (722 is not a test for nm = 5) for one and two contaminators 
for a = 5% and n = 5. Figures 17 and 18 are for ra, ra, Tox for n = 15. The 
six curves represented by the single curve of Figure 17 lie within 5% of the 
curve shown. The same is true of the three curves represented by each of the 
two curves of Figure 18. 

Since no loss in performance results for larger samples from the use of ra , 
ra, T2 in place of ri, Tu, 12, and further, these criteria are not appreciably 
affected by the presence of another outlier it would seem unwise to recommend 
the use of rio , 7, Ti2 . However, note that for small samples (see Figures 11 and 
12) the performances of 73) and ry, and ry are considerably better when a single 
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Fic. 13. Comparison of the performance Fic. 14. Comparison of the performance 
of the 7. criteria for one and two location of the 7. criteria for one and two scalar 
errors in samples of size 15, a = 5%. errors in samples of size 15, a = 5%. 
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Fig. 15. Comparison of the performance Fic. 16. Comparison of the performance 
of the r2. criteria for one and two location of the re. criteria for one and two scalar er- 
errors in samples of size 5,a = 5%. rors in samples of size 5, a = 5%. 
outlier is present. Therefore in larger (n > 10) samples ro or 72 would appear 
to be the best criteria. In samples of size 10 or less, 719 or 72 should be used; 
ra if the extreme value at the opposite end should be avoided. 

It should be noted in the comparisons that no model of contamination was 
investigated which would cause one or more errors at both extremes in the 
sample. It is obvious that the performance of D, and D, would be considerably 
decreased while the performance of ry , riz , and ra: , 722 would not be materially 
affected since these criteria avoid values at the opposite extreme. Their repeated 


use might discover most of such outliers, while D,; or D, might fail on the first 
trial. 
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Fig. 17. Comparison of the performance Fia. 18. Comparison of the performance 
of the r2. criteria for one and two locationer- of the re. criteria for one and two scalar er- 
rors in samples of size 15, a = 5%. rors in samples of size 15, a = 5%. 
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Fic. 19. Performance of B, for various levels of significance when the population is 10% 
contaminated with location errors. 


6. Sampling from a contaminated population. In the previous sections the 
performance of the various criteria were assessed for samples where a certain 
number of contaminators were present. One might well ask why a test is needed 
is it is known that contaminators are present. It would seem more realistic to 
state that a certain per cent of contamination will occur in the long run and 
that one will not know in any particular case whether 0, 1, 2, --- contaminators 
will be present. One would then wish a criterion to indicate the presence of 
contamination in a particular sample. 

The performances of these criteria will be investigated for the same two 
models of contamination and their performances will be reported as per cent of 
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Fig. 20. Performance of B, for various levels of significance when the population is 10°% 
contaminated with scalar errors. 
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Fic. 21. Performance of B, for various levels of contamination for location errors and 
using the 5% level of significance. 


total contamination discovered. The tests will be applied only once to each 
sample. Repeated use of the criterion would in many cases increase the per cent 
of total contamination discovered. It is not known what effect such a procedure 
would have on the level of significance. 

Investigation has been made for 5, 10, and 20% contamination. For example, 
in samples of size 5 which have 10% contamination, on the average, 59.0% of 
the samples will contain no “errors’’, 32.8% will contain one, 7.3% two, 0.8% 
three, 0.1% four, and 0.0% five. Thus in 100 samples of 5 which are 10% con- 
taminated with location errors having mean ph + 5¢, about 59 contain no errors. 
If the ri criteria is used with a 5% level of significance one value will be “dis- 





Fic. 22. Performance of B, for various levels of contamination for scalar errors and 
using the 5% level of significance. 
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(Location) (Scalar) 
Kia. 23. Performance of ry , D; , rx , De in samples of size 5 using the 5% level of signifi- 
cance and sampling from a population which is 10% contaminated. 


covered”’ in 3.0 of the samples containing no errors. Of the 33 samples containing 
one “error” the “error’’ would by discovered in 18 of these samples. This criteria 
would discover none of the “errors” in samples containing more than one “er- 
ror’. We would have obtained 18 of the 50 contaminating values and 3 which 
were members of the original population. 

When o is known the performance will increase when more contaminators 
are present. Performance however has been measured in terms of finding a 
single contaminator; i.e., the test has been used only once. Therefore even with 
increasing percent contamination the level of performance will decrease with 
increasing contamination. Repeated use of the test criteria has not been in- 
vestigated. 
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Fig. 24. Performance of rj(D,) and ree(Di , ro , rx) for various levels of significance 
when the population is 10% contaminated with location errors. 
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Fic. 25. Performance of rio(D,) and re2(D; , ro , P1) for various levels of significance 
when the population is 10% contaminated with scalar errors. 


Criteria B, gives the best performance for both location and scalar errors for 
the levels of contamination and levels of significance considered. A and C, are 
only slightly inferior. B, is handicapped when more than one error is present 
thus its performance is poorer for heavier contamination. Figure 19 shows the 
performance of B, for the different levels of significance, 10% contamination, 
and the two sample sizes 5 and 15 for location errors. Figure 20 shows the results 
for scalar errors. Figures 21 and 22 show the performance of B, for the 5% 
level of significance for the different levels of contamination. 

When o is not known the performance of various criteria will eventually 
decrease as more and more contaminators are present in the sample even though 
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Fic. 26. Performance of ryo(D,) and r22(Di , ro , 721) for various levels of contamination 
for location errors and using the 5% level of significance. 








504 W. J. DIXON 


S 


7 7 7 
i _ 


490 


75 


So 


<. 
is} 
we \ 


25. 25. 








v | 
——§ et 


| | 


A 
3#4#@S5F67 8 


ee Ff £2 
T10(D;) roo( Dy , T20 y Ta) 
nr= 5 r= 15 


Fic. 27. Performance of ryo(D,) and r22(Di , ra , 71) for various levels of contamination 
for scalar errors and the 5% level of significance, a = 5%. 
70 £ ’ oO 


several of the criteria show improvement in discovering a single error if two 
are present. The performance of these criteria is greatly affected by the size 
of the sample. For samples of size 5, rio and D, perform alike, rio being superior 
to the other 7’s (ray second best) for the levels of contamination considered, 
and Dz is inferior to ra. Figure 23 compares the performance of 710, Dy, ro , 
and D, for the 5% level of significance and 10% contamination. The results 
for other levels of significance and contamination are comparable. 

For samples of size 15, 720, 7 and rez. perform alike as do ry, ru and ry. D, 
and 72, fo, 72 perform approximately the same and are superior to rw, 7u, 
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Fig. 28. A comparison of the performance of rez and D, for two scalar contaminators 
when tests are made at one extreme only, a = 5%, n = 15. 
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and ry: . Critical values are not available for D. for n > 12. The performances 
of D1 , r20 , 71 and 7 are indicated by a single line in Figures 24, 25, 26, and 27 
which show the effect of level of significance and level of contamination of the 


performance of D,, 72, 72 and re for samples of size 15 and for ri (D,) for 
samples of size 5. 


7. Remarks and conclusions. Throughout the investigation of performance, 
location errors were placed only at one extreme and scalar errors at either ex- 
treme. The test for an error was made using as a suspected value the extreme 
value in the direction of the location error or in the case of the scalar error the 
value most distant from the mean. It can be expected then that if performance 
were assessed when location errors could occur in either direction, different 
results would be obtained. Also in the case of scalar errors if errors were always 
sought at one particular extreme or at both extremes different results would be 
obtained. If these changes were made in the models of contamination, those 
criteria designed to avoid errors at the other extreme would have an advantage 
over those which were not so designed for ¢ unknown. If o is known the criteria 
which do not avoid the other extreme would have an advantage over those 
which do avoid the other extreme. These points just mentioned will be used to 
discriminate between those criteria which were judged to be equal in perform- 
ance under the models used in the sampling study. For example, Figure 28 
compares the performance of rx. and D, for two scalar contaminators when 
tests are made only at one extreme, a = 5%, n = 15. 

1. For o known: 

B, or C; should be used, or in small samples A, B, or C; should be used. 

2. For o unknown: 

r19 Should be used for very small samples. 72. should be used for sample sizes 
over 15. Probably ra would be best for sample sizes from about 8 to 13. If sim- 
plicity in computation is not important and “errors” are not expected at both 
extremes D, would do equally well. When critical values are available for larger 
n, Dz should prove useful in the larger sample sizes. 
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DISTRIBUTIONS RELATED TO COMPARISON OF TWO 
MEANS AND TWO REGRESSION COEFFICIENTS 


By Utrram CHanp! 


University of North Carolina 


Summary. We consider here the relative merits of different statistics avail- 
able for testing two means or two regression coefficients in relation to one-sided 
(asymmetric) and two-sided (symmetric) alternatives in case of unequal popula- 
tion variances. In so far as the Behrens-Fisher statistic is concerned we confine 
ourselves to the consideration of the behavior of its probability of Type I error 
in repeated sampling from populations with a fixed value of the unknown ratio 
of variances. In connection with the tests between two means, the present 
study takes its point of departure from the existing tests and investigates the 
question of utilizing an approximately determinate knowledge about the un- 
known ratio of variances. In connection with the comparison of two regression 
coefficients and also of two linear regression functions, we consider the effect of 
two concomitant sources of variation, viz., the unknown ratio of residual variances 
and the ratio of the sums of squares of the fixed variates, on the probability of 
Type I and Type II errors of certain well known statistics. 


‘ : ; , ‘ 
1. Introduction. Consider two independent samples x; +--+ 2,11 and 41 *+* Yag41 
: : . 2 2 
drawn from two normal populations with means m,; and m2 , variances o; and a9. 
Let K = o3/o2.I1f K is known and m, = m2, the quantity 


RQ 
| | 
n> 


- c= a) +KS(a'-2 (1, aetna 
Nm + ns m+ 1 > K(nz + 1) 


(i; is Fisher’s ¢) is distributed according to “Student’s” distribution with n; + ne 
d.o.f.” and for the “Student’s” hypothesis Ho:m, = m. provides a uniformly most 
powerful test against an asymmetric alternative H,:m, > (or <)m, and a 
type B, test against a symmetric alternative H2:m,; # m.. If K is unknown 
certain approximate and exact tests have been suggested from time to time to 
meet this situation. 

Welch [1], [2] using an approximation to the distribution of 4, was the first 
to point out that if K is unknown and we assume it to be equal to unity, then 
the probability of Type I error of the ¢,-test is subject to large variations as K 
varies from 0 to «. He also pointed out that the statistic 


ne — a)? (at en)? 
(z = [ S(a , Z) 48 (x a | 


ee Te i Vee oO 


1 Now Assistant Professor of Mathematical Statisties at Boston University. 
2 Degrees of freedom. 
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which does not have ‘“‘Student’s” distribution for K = 1, has the advantage 
that its probability of Type I error is subject to less variation with respect to K. 
His approximate results were later confirmed by Hsu [8] who obtained the 
distribution of quantities u;(=t{) and u(=v") and also showed that these tests 
are unbiased in the sense of Neyman and Pearson. Hsu concluded on the basis 
of his investigations that when the sample sizes are equal and not very small, 
we may safely use u;(=w,) as if K were unity. This also had been pointed out 
by Welch. 

If on the basis of past experience some approximate value k of K were available, 
one would like to know if such a choice in some rough neighborhood of K would 
in any way improve the claim of t,(=tx for K = k) for the hypothesis m, = m. 

m(ny + ) 
N2(N2 + 1) 
will be obtained in Section 2.1. It will be shown that variation in the probability 
of Type I error of ¢, with respect to K for any k except when t, = 2, is essentially 
similar in character to that of tj [3] and is very sensitive in a neighborhood 
of K in which one would very often be interested (Section 2.4). This is also true 
of the behavior of the power function of ¢ with respect to K. Consequently a t, 
type of statistic will be unsuitable in general for utilizing an approximately 
determinate knowledge of K. 

It is not possible to infer directly from Hsu’s work on the relative merits of t; 
and v in relation to asymmetric aspects of ‘‘Student’s” hypothesis. His basic 
conclusions as regards unbiasedness and the nature of variations in Type I 
error in the symmetric case also hold for the asymmetric case except that the 
Type I variations in 4; and v are less for asymmetric than for symmetric com- 
parisons (Section 2.5 and Table II). Furthermore it appears (Section 2.5 and 
Table III) that with respect to the variations of K both the asymmetric and 
symmetric power functions of t; are likely to be more sensitive than those of v. 
Since for equal d.o.f. both the asymmetric probability of Type I error and 
power function are insensitive to the vagaries of the ‘nuisance’ parameter K, 
there is an a fortiori reason for using v(=¢,) as if K were unity. 

Scheffé [4] considered the statistic 


The distribution of this generic quantity uf =t, fork =1; =vfork = 


nit+l -\2 \—}3 
ee 
( ) ) (m < Ne), 


in @-2)(% mim ¥ 1) 





4 
(equivalent to paired difference t when n; = m2) where u; = 2; — (= = ) 2’; 
2 


and where it is assumed that the variates in each sample have been randomized. 
This is essentially a ““Student’s” ¢ comparison based on 7, d.o.f. and as shown by 
Scheffé it is impossible to get a suitable statistic with the ¢-distribution with 
more than n, d.o.f. The statistic v has the ¢-distribution only when K = «(nm 

- m(n; + 1) 
d.o.f.), K = O(m d.of.) and K = on + 0 
m ,m, K and P we can solve P = P(v > to| Ho) for t and thus indirectly obtain 


(ny + nm d.of.). For any given 
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from the tabulated values of the ¢-distribution the number of ‘effective’ d.o.f. 
which will thus adjust v to any preassigned level of significance. We try to 
show in Section 2.6 that in situations where some approximate knowledge of K 
is available, the statistic v seems to have a decided advantage over any other 
statistic having the ¢-distribution. We show by actual computations that Welch’s 
formula [2] provides a conservative estimate for the effective d.o.f. in the light of 
which this comparison will be considered. 

The Behrens-Fisher fiducial test employing the statistic d [5], [6], which has 
essentially the same structural form as v, has given rise to much controversy 
essentially because of inconsistencies arising from tests of significance based 
on the fiducial distribution of unknown parameters. We attempt to show in 
Section 2.7 that the fiducial test in general is ‘conservative’ in detecting significant 
results in repeated sampling from populations with a fixed value of the unknown 
ratio of variances. 

In the case of comparison of two regression coefficients when the residual 
variances are unequal, We are faced with a similar type of om Consider 
two samples 4 Yu | %, and y, | t,(e=lees,mti1jv= 1, --- ,% + 1), where 
a, and x, are fixed and y, and y, are vine sand independently distributed 
according to N(a, + 6;(%, — #), 01) and N(a2 + B(x, — 2’), 03) respectively. 
For the hypothesis 8; = 82 when the alternatives do not specify anything except 
B, > B. or <2; or B; ¥ Be we shall consider the merits of statistics ¢* and v* 
which correspond to statistics 4; and v for the two means. While the statistic ¢* 
is sensitive to the variation of both K = oj/c2 and w, the ratio of the sums of 
squares of the fixed variates, the statistic v* is insensitive to the variation of 
both. Barankin® has extended Scheffé’s test to the comparison of two regression 
coefficients under the above assumptions. The statistic proposed by Barankin 
has Student’s distribution with n; — 1 d.of. (ny < m2) and provides the only 
exact unbiased test so far known. While Scheffé’s test for the comparison of 
two means and Barankin’s test for the comparison of two regression coefficients 
should not be used when K is known and were never intended to utilize any 
available approximate information about K, the question of investigating into 
the possibility of using v* in the latter situation is not without interest (Section 3). 
In Section 4 we consider the hypothesis of equality of two linear regression 
functions viz., Ho: a1 = a2, 8; = B2 when the alternatives do not specify anything 
except Q, ~ a or By F Bo ‘ 

In studying the behavior of the power function and the probability of Type I 
error of certain statistics under discussion we have made full use of Hsu’s method 
and consequently only essential details have been given here. 


2. Hypothesis of equality of two means when variances are unequal 


2.1. The distribution of t, for any values of n; and nz . Consider the test function 
t.(=tx for K = k; Section 1) where k is some inexact value of K. This can be 





3. W. Barankin, “Extension of the Romanovsky-Bartlett-Scheffe test”? Proc. Berkeley 
Symposium on Math. Stat. and Prob., University of California Press, 1949, pp. 433-449. 
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put in the form of & = (& + 4) (bxi + ex3)? where ~ is N(0, 1) and the x's 
have independent x’-distribution with n, and mn d.o.f., and where 





2 2 \-4 
ae = 01 02 
om m) (1 + — 2.) , 
b = (K/k) (ny + me) [k(me + 1) + m + 1) [Ke + 1) +m4 1", 
c= (m+ m) [k(n + 1) +m 4+ 1 [Kim +1 4+m4+ 1)", 


b/c = K/k. 


In what follows we shall omit the subscript k from t. The joint probability 
element of £, xi and x; is given by 


dF (Ex, xi, x2) = (Qa) [P (rn/2)P (ne/2) 1 (5/2) 279 
(x2/2)"*? déd(xi) d(x2). 
We transform to new variables t, r and 6 by the relations 


E+ 8 = t(bxi + x3)’, 
bxi = r’ cos’ 6 (0< 6< 7/2), 
cx: =r’ sin’ 6 (-»x <ri+o), 


and integrate out r. To integrate out @ we put z = sin’ 6 if b < c and z = cos’ 6 
if b > c. This reduces the integration w.r.t. @ to a series of hypergeometric 
integrals. We finally have the following form for the frequency function of «; : 


evr (b/e)"*"? c " (st)" (2by""P r(™ > No . ¥ aa ‘) 


pena 
(2.1.1) r(5)r("+ *) r=0 hi + of) +m srs 1 


2 


2 





P (mtetet) ma ma + m2 I =) 
2 a 2 ’1+ bf? 
where F denotes the hypergeometric function. As a check if we put b = c = 
(ny + ne), we get the frequency function of non-central t for n; + n2 d.of. For 
the case b > c we have only to interchange b with c and n, with ne. 
The null distribution of t,(6 = 0) is an even function of ¢ ; consequently the 
forms of the single and two-equal-tailed probability of Type I error will be the 


same except for the constant 3. If we let 6:(6, K, k, m , m2) = / q(t) dt denote the 
to 


single upper tail power function of ¢;, , from (2.1.1) we obtain 


Bi(6, K, k, mi, ms) = 36° "(K/k)"*" DU DL 


h=0 r= 


(2.1.2) (52/2)? p a 4 h)(1 _K) 


) 
a k le, (* + Ne +h, r+ r} 
at) a 2 
3° 


Av \e 
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where 2» = (1 + bt’) and J,,(p, q) is the incomplete beta ratio. To obtain the 
two equal tailed power function 6.(6, K, k, m1 , n2) we need only change r into 2r 
and omit the factor 3. 

2.2. Distribution of t. for even values of n; and nz . (For notation refer to Section 
2.1). When n; and nz are even, the method of characteristic functions yields a 
single infinite series for the distribution of ¢, , and when 6 = 0 this series reduces 
to mam terms. The characteristic function of X = bxi + cx3 is given by 
o(r) = (1 — 2bir)~""” (1 — 2cir)~”*””. To obtain the form of the frequency func- 
tion of X we make use of the inversion theorem and integrate round a standard 
contour in the lower half of the complex plane. The distribution of ¢, can then be 
obtained from the joint probability element of § and X. We obtain the following 
form for the single tailed power function of t; : 


ease wen (8°/9)" > nol2 (ni/2)—1 
8,(8, K, km, 12) = go? 52 @ 2) (=) > 
r=0 , ' K-—k h—=0 


bol 





h=0 mi\ 2, 
r(®) a 


K \._,(n r+i1 i 
(4) I se (* _ h, 9 ) (K > k) 


where 2 has been defined in the previous section and 2) = (1 + cls)’. 
2.3. Unbiasedness of a test based on t,. Since the single and two tailed forms 
of the power function of ¢, (Section 2.1) are essentially the same functions of the 


9 k /2 ~~ r (3 + i) 
+ (—*( vu ” _ 


standardised ‘distance’ 6, following Hsu [3] we can show that * > 0 and = >0 


for any fixed K and k; and consequently such a generic type of statistic provides 
an unbiased test both against symmetric and asymmetric alternatives. 

2.4. Variations in the power function and the probability of Type I error of t . 
For the case k = 1, Hsu [3] has already shown that the probability of Type I 
error of the statistic {| is subject to large variations w.r.t. K. He also pointed 
out that the behavior of the derivative of its power function w.r.t. K for fixed 6 
was similar to that of its probability of Type I error w.r.t. K. We shall presently 
see that ¢, also shares this property with ¢j . 

In the first place one would like to know if any choice of k in a small neighbor- 
hood of K would stabilize the variations in the Type I error of ¢ to such an 
extent as to make it approximately insensitive to that difference between k and 
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K. With this end in view we shall examine the nature of variations in the proba- 
bility of Type I error of & w.r.t. K for any fixed k. 
From (2.1.2) by putting 6 = 0 we obtain 


P = Pe > &) = M/E (B+ hb) — K/iy 
=0 “ 


(2.4.1) as - 
Ne Ny Ne 1 
(r(G)ra +o) m(B 5" + 44). 


We now differentiate (2.4.1) and after simplification obtain 


=z < CAK/B tnd + 1) — alms + D/EIK Ge + 12) + a. + IK < BD. 





Similarly 
> C.[ne(n2 + 1) — mlm + 1)/A[K(me + 1) + m4 I" (K > k), 


where C; and C; are certain positive constants independent of K and k, 


If k = n(n + 1) 


we have 
No(Ne2 + 1) 


S| Q, 
Py 
VIA 
oO 


for K =k. 

This is the case when ¢; is identical with the statistic v defined in Section 1 
and the probability of Type I error curve expressing P as a function of K has a 
minimum at this point: for n) < mn. the minimum occurs for a value of K < 1 
and vice versa. And since v is known to be insensitive to the variation of K [8], 
therefore ¢;, is insensitive to the variation of K for this value of k. 

For any other assumed value of k the curve either starts decreasing from 
K = o orfrom K = 0 to the point where K = k depending upon the values of 
m, and nz. In each case the ordinate of the curve continues to decrease for some 
distance; it may decrease to a minimum and then start increasing or else decrease 
indefinitely. For fixed 6 the power function of % also has a minimum when 
Kapa we sy 

No(Ne + 1) 
similar to that of its probability of Type I error. For the case k = 1 numerical 
values of the single and two-tailed values of the probability of Type I error 
and power function for different values of n; and n, and K are given in Tables II 
and III (Section 2.5). 

In certain practical situations it may happen for example that on the basis 
of past experience one can determine k so that } < | k — K| < 2. The question 
arises: how much is ¢; sensitive to such a neighborhood for any k, K, mn and nz ? 
That it is hard to provide a practically useful answer to this question will be 


; and for any other k the behavior of its power function is 


“wT er 
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apparent from the nature of the distribution of &, which depends both on 
K and k and not merely on their ratio. The following Table I will indicate how 
in such a small neighborhood P(t, > t) can be in serious error in two different 
directions. 

2.5. Statistics t, and v in relation to asymmetric and symmetric aspects of 
“ Student’s’”’ hypothesis. Statistics t; and v are special cases of t and the behavior 
of their probability of Type I error and power function has already been discussed 
(Sections 2.3 and 2.4). In this section we compare the single-tailed and two 
tailed values of the probability of Type I error and power function in the light 
of several particular examples. In all these calculations e.g. in P(t > t) and 


TABLE I 
Variations in P(t, > to) with respect to k for fixed K 


(K = 5; m = 2; ne = 4; to = 2.447) 


k= 1 2 3 4 5 6 7 
1129 .0936 .0749 -0607 05 .0418 -0355 
TABLE II 


Variations in the symmetric and asymmetric probability of Type I error of v and t; in 
relation to the unknown ratio of variances K 














. % point of 
K | 0 125 S 1 2 4 8 16 © Ghulated & 
n= ne=3 | .074 .0633 .0504 .05 .0504 .0568 .0633 .0691 .074 single tailed 5% 
v=t - .092 .0681 .0525 = .05 .0525 .0597 .0681 .0770 .092 two-tailed 5% 
be .034 =. 0181 0110 ~—. 01 .0110 .0138 .0181 .0227 .034 two-tailed 1% 
ni = 4, ne = 16 .0112 .0129+ .0142 .0195 .0227 .0265 .0293 .0305 .0324 single tailed 1% 
rt - .012 0161+ =. 0197 -0238 .0204 .0359 .0407 .0433 .0465 two-tailed 1% 
nm = 8,n2= 4 .075 — . 0687 .0598  .0543 .0541 .0511¢ .0521 .0531 .056 single tailed 5% 
m = 4, ne = 16 .00011 .00043 .00310 .01 .0221 .0483 .0793 .0864 .133 single tailed 1% 
t ” .00007 .00031 .00244 .01 .0310 .0592 .1169 .1544 .222 two-tailed 1% 
n= 8,n2=4 .1342 .1056 = .0710 ~=—.05 .0368  .0287 .0246 .0224 .0204_ single tailed 5% 
+ P = .01l when K = .074 
~P = .05 when K = 3.6 


P(|t| > to), t refers to the single and é to the two tailed values of Fisher’s t 
for the appropriate number of d.of. Tables II and III give the approximate 
values for the probability of Type I error and the power function respectively 
both against symmetric and asymmetric alternatives. 

For equal sample sizes (v = ¢,) the Type I error and power function curves, 
representing probability of Type I error and power function as a function of K, 
have a minimum when K is unity and a maximum occurs when K is either zero or 
infinity. Maximum values of the probability of Type I error for several equal 
sample sizes are given in Table IV. It appears that for equal sample sizes the 
probability of Type I error and the power function are likely to be insensitive 
to the variation of K. We also notice in this connection that while the single 
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tailed values of the probability of Type I error are less than those of the two 
tailed values, the values of the two tailed power function for 6 = 1 are less 
than the corresponding single tailed values. This appears to be true also for the 
statistic v when n; ¥ nz. For unequal sample sizes also the probability of Type I 
error and the power function of ¢,; are likely to be more sensitive to the variation 
of K than those of v. It may be pointed out in the sequel that while it is recognized 
that for unequal d.o.f. a fair comparison of the probability of Type I error and 
the power function of v with those of ¢; ought to adjust v and ¢, to the same level 
of significance, namely the same maximum (for all K) probability of Type I 
error, this would not alter our conclusions about the sensitive nature of t, . 


TABLE III‘ 


Variations in the asymmetric and symmetric power function of t; and v corresponding to the 
5% point of tabulated t:(6 = 1) 


K= 0 & 1 2 2 





m= n= 3 189 141 137 141 189 symmetric 
v= th 269 .229 2255 .229 .269 asymmetric 
n, = 8, nm. = 4 354 202 .152 112 .063 symmetric 
ty .428 294 2425 .194 122 asymmetric 
m = 8,n. = 4 .208 .196 . 162 . 1567 .168 symmetric 
v . 286 .299 .247 .244f 255 asymmetric 

+ minimum of .152 is reached for K = 3.6. 

t minimum of .242 is reached for K = 3.6. 


TABLE IV 
Maximum probability of Type I error of v(= t:) for equal degrees of freedom 





Symmetric Asymmetric 
mt1i=n+1 5% 1% 5% 1% 
a .0721 .0224 .0625 .0182 
9 .0668 .0193 .0595 .0162 
11 .0635 O173 .0576 .0150 
15 .0598 .0152 .0555 .0136 


21 .0569 


-0137 .0538 .0125 


2.6. Statistic v, Scheffé’s test and paired difference t. If K is known, v or Scheffé’s 
statistic S should not be used. If K is unknown, S is an ingenious device for 
getting a Student’s ¢ with min(n;, n2) d.o.f. and provides the only exact un- 
biased test so far known. In such a situation since nothing is known about K, a 
fair comparison of the power function of S with v ought to adjust v to the same 
maximum probability of Type I error for all K (maximum will occur for K = 0 
or K = & according as n; 2 ne); and at such a maximum significance level it is 

4 The author acknowledges with pleasure the help given in the preparation of this table 
by Miss Elizabeth Shuhany of the Statistical Laboratory, Boston University. 

5 Values taken from [7]. 
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recognized that v cannot be uniformly better than S. For samples of equal 
size n the use of the paired difference ¢ with n — 1 d.o.f. (equivalent to S when 
n = Nn ; Section 1) provides a suitable test for two reasons: (i) it is exact and 
(ii) as shown by Walsh [8] has a high power efficiency. 

If any approximate a priori information about K is available, » appears to 
be the only suitable statistic to utilize such information. While S was not intended 
to cope with such a situation, t (Section 2.4) has been shown to be unsuitable. 
Since v is insensitive to the variation of K, we shall not be far wrong in using 
‘effective’ d.o.f. based upon an assumed value k of K satisfying some such relation 
as4 <|k— K| < 2. The effective d.o.f. of v as given by Welch [1] and as given 
by P = P(v > t) or by P = P(|v| => to) for fixed P (listed in Table V as caleu- 

are 3 os see ae ae - +: = m(n; + 1) 
lated d.o.f.) are identical for K = 0,1, an © (mn; = ne) and (ii) K = 0, a ae 
and ©(n,; # nm). For other values of K it appears from Table V that Welch’s 
formula errs on the conservative side. The effective number of d.o.f. vary between 
nm, + m2 and min(n, nz) (ef. d.of. for S). Consequently in the absence of any 


TABLE V 
Adjusted power function of v in the light of ‘effective’ degrees of freedom 


. Adjusted asymmetric power function of 9 ° 
Sample Size for probability of Type I error of .05 Effective d.o.f. 





Welch’s formula 


é6=1 | 6=2 Calculated 
K = 0.125 4 C) \K = 0.12 4 o K=0 .125 4 o/K =0 .125 4 « 








mtl=ne+tl=3 | .174 .204 .204 .174 .384 .476 .476 .384 | 2 3.36 3.36 2| 2 2.94 2.94 2 
mt+l=ne+tl=7 | .225 .236 .236 .225/| .550 .581 .581 .550 6 9.14 9.14 6] 6 8.82 8.82 6 
m+1=9;n2e+1=5) .210 .227 .242 .233 | .504 .556 .594 .572' 4 6.50 11.90 R| 4 5.14 11.90 8 





best unbiased test and in the light of any approximate information about K it 
would appear that v has a decided advantage over any other statistic. 
2.7. The Behrens-Fisher test in repeated sampling. Consider the statistic 


d = ( — 2) (si + si)? = t, sin @ — & cos 8, 


where sj and s3 are the unbiased estimates of the variances of the means Z and #’ 
respectively, ¢, and ¢. have independent ‘“‘Student’s” distributions with nm, and n. 
d.o.f. respectively, and tan 6 = s;/s. . On the basis of the “‘fiducial” distribution of 
o; and a3 Fisher [6] regards d as a “mixture” of t, and t. with constant coefficients. 
It is to be noted that if s; and so are fixed in the classical sense ¢; and f have 
independent normal conditional distributions with zero means and variances 
o;/s; and o/s; respectively; and if s; and s. vary in their own distribution d is 
identical with v (Section 1). 

Neyman [9] considered the integral of the joint probability law of 2, 2’, si , s2 

~ = 


Z—Z ; : 
over the set Ved si < t, sin 6 — t cos 6 where the quantity on the right also 


depends upon s; and s. and is the quantity d tabulated by Sukhatme [10], [11]. 
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Neyman showed in particular that if pairs of normal populations with different K 
are sampled (n; + 1 = 13, nm. + 1 = 7), then the relative frequency of correct 
statements about m, — m, based on the 5% points of d will not be equal to the 
expected .95 and will vary with K. 

We consider here the following similar type of question: what is the nature of 
discrepancies that will arise in the probability of Type I error by the repeated 
use of the Behrens-Fisher test in sampling from two normal populations? We 
observe that since d and v have the same structural form, the appropriate 
probability of Type I error in such a situation will be given by the probability 
integral of v (Sections 2.2 and 2.5). 











TABLE VI 
Minimum and maximumf values of P( | v | > do) for different values of K 
K 0 .05 1 2 © do 
mtl=n+i1=7 .05 .0321 .03807 ° .0321 .05 2.447 
.0508 -0329 .0313 -0329 -0508 2.435 
mtl=n+i1=9 .05 .0362 .0346 .0362 .05 2.306 
.0512 -0367 -0358 -0367 -0512 2.292 
mtl=nt+1l=13 .05 .0405 .0396 .0405 -05 2.179 
.0507 -0434 -0403 .0434 .0507 2.170 
mt1l=7,n+1= 13 .0307 .0281 .0317 .0393 .05 2.447 
.05 -0460 .0516 .0597 -0720 2.179 
m= Nn = 0 .05 .05 .05 .05 .05 1.960 


ft maximum values have been indicated in bold type. 


We observe that P(| v | > x) is a monotone decreasing function of x for any 


fixed K, n, and n.. Furthermore for fixed x, n; and nz we have a 2 0 for (i) 

_ ey pe > M(NM + 1) se _ 
K2=1,m = mand (ii) K = ams 1) m ~ n. Table VI gives the minimum 
and maximum values of P(|v| > do) for different values of K where dp corre- 
sponds to the highest and lowest value of tabulated d. It appears that for equal 
sample sizes the minimum probability of Type I error is less than .05 and will 
converge to .05 when K is either infinity or zero. The maximum probability of 
Type I error converges to a value slightly higher than .05. This probability also 
converges to .05 with increasing size of equal samples for every K. For unequal 
sample sizes e.g. 21 < m2, the minimum values converge to .05 when K = ~ and 
if nm. > ne, this convergence takes place when K = 0. The maximum values 
are both greater and less than .05. 





3. Hypothesis of equality of regression coefficients when residual variances 
are unequal. 


3.1. Unbiasedness of tests based on statistics t* and v*. Consider 


Sy —- YP + Sy -Y)/1 :i7" 
rao w SOE MAO PMCS 4 BY 
1 27 « 4h 1 aL 2 





SS OE Oo ll 


we 
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and 








*- @ —2)| So-Y* ir 
oe oe (bs ba) 2 = 1) M2(ne2 _ 1) 


where b; and be are regression coefficients calculated from independent samples; Y 
and Y’ are the sample regression functions; M, = S(z — #)’ and M2 = S’(a’—2’)’. 
Under the assumptions of Section 1 these two quantities are distributed as 
i = (E + A) (uixivma + Hex3.ne—t) 5 
v* = (é + A) Ouxi.e:-1 + heal ane), 


respectively, where ~ is N(O, 1) and the x”s have independent x’-distribution 
with d.o.f. indicated in the second subscripts, and where 


M;/M:2 = W, 
wy = K(w + 1) (K + w) (m + m — 2)7, 
ue = (w+ 1) (K + w) (um + m — 2)", 


m= K, 
Me 

oi ax" 
A= (1 - 6) ($1 + #) ’ 
M = K(K + wv) (m - 1", 
ho = w(K + w) (ne — 1)", 
Ai Pe ed 1 
rT 


Consequently these two statistics have the same basic distribution as obtained 
previously for & (Section 2.1) and their power functions are monotone increasing 
functions of the standardized ‘distance’ A for fixed values of K, w, n, and ne. 
While the statistic ¢* has ‘“‘Student’s” distribution with nm, + nz. — 2 dof. 
whenever K = 1, the statistic v* is only so distributed when K = w(n, — 1) 
(ne a - 

3.2. Variations in the probability of Type I error and power function of t* and v*. 
The behavior of the partial derivatives of the probability of Type I error and 
the power function of ¢* and v* w.r.t. K and also in relation to w is essentially 
the same. For purposes of illustration we shall only consider the behavior of the 
probability of Type I error. We shall presently see that for the hypothesis 
8; = Be (ef. “Student’s” hypothesis m, = m2) while ¢* is sensitive to the variation 
of K and w, v* is insensitive to both. 

3.2.1. Variations w.r.t. K for fixed w. Remembering that the x”s in the de- 
nominator of ¢* have respectively n, — 1 and n2 — 1 d.o.f., we can write down 
P(t* > t) from the corresponding form for ¢, (Section 2.3). After simplification 
we obtain 


(3.2.1.1) oe < lie - ) -ee- c+ we (K <1), 
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where 2 = (1 + pito). If we make use of the relation P(m , ne, Mi, Mz, K) = 
P(n2, 1, Mz, M,, K~’) in (3.2.1.1) we obtain 


(3.2.1.2) a i+ ea =< 1) ~~ (K > 1), 


where L, and Ly» are certain positive constants independent of M,, M. and K. 
Similarly for the statistic v* we have 


oP 


(3.2.1.3) aK < Di(K¢)"[(m2 — 1) — w(m — 1)¢]/(K + w) (Ko < 1) 
and 
(3.2.1.4) am > D[(nz — 1) — w(m — 1)¢]/(K + w) (K¢ > 1), 


where D, and D, are certain positive constants independent of K, M; and M, and 


ne — I Ne — 1 
NE 7, ° h: : . one : — os i 2 
aim > bi. We notice that if (i) m. = nz and w = 1 or (ii) w ——e 


we have ¢* = v* and both from (8.2.1.1), (3.2.1.2) and from (8.2.1.3), (3.2.1.4) 


where ¢ = 





2 
we obtain - = 0 for K = 1. In the case (i) the maximum probability of Type I 


error occurs at K = © and K = 0. In case (ii) the maximum will sometimes 
occur for K = 0 and sometimes for K = , depending on the relative magnitude 
of nm, and ne. 

For other situations ‘* and v* exhibit a type of behavior essentially similar 
to that of ¢,; and v (Section 2.5). We notice that the (P, K) curve for v* has a 
a eed If nm, = ne, the minimum point is given by 
Kk = w. Therefore with an approximate knowledge of K, a useful practical hint 
to remember is to so adjust M, and M; as to have w approximately equal to K. 
If n; ¥ ne any information about oj being greater or less than oj can be used 
with decided advantage to adjust M, , M2 , n, and m so as to reduce considerably 
the risk of the first kind and thus work in a region of the (P, K) curve where 
there is not much danger of bias in the probability of Type I error. This will 
also reduce the fluctuations of the power function of v about its minimum which 
w(n — 1) 

a | c 

3.2.2. Variations in relation to w for fixed K. The partial derivative of P(t* > t) 

with respect to w is given by 


minimum when K = 





also occurs for K = 





OF = 40 — KK" (K + wy? Da — KY 
ow h=0 
(3.2.2.1) r(™ - 5 4 h) rae fi.» ) 
9 «<0 


celta — T\ osm + me — 2 
ane) a(R? 4 0,2) 


(K <1). 


od 


re 
ill 
ch 


to) 


1). 
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Therefore 
oP 
aw >0 
for K <1. 
Similarly 
oP 
dace 
Ow 
for K > 1. 


To justify the differentiation of the series in (3.2.2.1) we make use of the result 


_ — 
if ee eee +h, :) wi 1,(mtm—? + & +1, 1) 


— — 


+n2—2)/2+h 
zo"! n2—2)/ (1 _ z0)' 


(CREP i)a(MER=F a 


and consequently the series under consideration may be shown to be dominated 
by an absolutely and uniformly convergent series for0 < K < 1. 
For the statistic v* consider 


Poot > t) = HKG)" D1 — Koy (BE + 8) 
h=0 





2 


—1 
rw + pr(™ > ‘)| i (* + =—* +h, 1) (K $ <1) 


—_ 


(3.2.2.2) 








where yo = (1 + dito)’. We notice from (3.2.2.2) and from the form of quantities 
\; and A» (Section 3.1) that P(v* > t) depends on K and w only through the 
product of K and 1/w. Consequently variations of P w.r.t. 1/w for fixed K 
are the same as those of P w.r.t. K for fixed w. Thus we may directly infer that 
P(v* > t) will be insensitive to the variations of w. The following Table VII 
will illustrate the nature of variations in the probability of Type I error in the 
tests based on ¢* and o* in relation to w. 


TABLE VII 
Variations in the probability of Type I error of t* and v* 
(K = 2; m = ne = 7; to = 1.782) 


| 








w 0 25 & 1 2 % 
Pit > te) .0259 .0358 .0427 .0512 .0594 .0866 


P(v* > to) 0625 .0570 0539 0512 -05 0625 


It would appear that on the analogy of statistics ¢, and v for the comparison of 
two means one could guess about the sensitive nature of ¢* in relation to the 
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variations of the ‘nuisance’ parameter K. The additional drawback in ¢* which 
stems from the monotone nature of its variations with respect to w is a further 
warning against the use of a ¢* type statistic for the hypothesis 8; = 6. when 
a; ¥ op. 

4. Hypothesis of equality of tvo linear regression functions when variances 
are unequal. 


4.1. The statistic Z. (For notation refer to Sections 2.1 and 3.1). Consider the 
model given in Sections 1 and 3 for the comparison of two regression coefficients. 
If the variances are equal, the statistic based on the likelihood ratio criterion 
for the composite hypothesis a, = a2 and 6; = #2 is given by 
2 an (j = G2)’ (ms + 1)(m. + 1)(m+ ne + 2)" + (b, — be)’"M.MAM, + M:)* 

my — FF + oy — FP 

The quantity Z is distributed like the ratio of two independently distributed x°’s 
and consequently its distribution is precisely determined under the hypothesis. 
If oi ¥ o2, Z can be put in the form of 

Z= (aixi,1 + @ox3,1) (Kx3,n,-1 + Cass". 
which is now distributed as the ratio of ‘mixtures’ of independently distributed 
x’’s with d.o.f. indicated in the second subscripts and where 

a, = [my +14 K(ne + 1] (rm + m 4+ 2)", 

a, = (K+ w) (1+ w)™. 


In the non-null case when a; # a2, 8; ¥ B2 the numerator of Z is a mixture of 
non-central squares. If we let B(K, w, 6, A, m1, m2) denote the power function 
of Z, following Robbins and Pittman [12] we obtain 


B(K, w, 6, A, m1, n2) = Dy 2, DC; dipel (at > +h~ lk+jt+ 1) 





j=0 h=0 k=0 


- n+l 
I 1, w 
(x > ,o< MEN), 


(4.1.1) 





where 


"i bia 
9)’ I 5 , 
{= nel ED (1 — a;/az)’, 


xorg or J + i\(1 a 4) 
2 K 


‘i= ——_—_—_— ' 
r(™ = ') h! 

m= ee? (1D)*/k! (DP =8 
¢ = (i+ Z/a)-. 
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4.2. Variations in the probability of Type I error and the power function of Z. 
Corresponding to (4.1.1) we obtain the expression for the probability of Type I 
error P(Z > Zo) by putting D = 0 and k = 0. It has not been possible to establish 
any definite law concerning the behavior of the probability of Type I error 
and the power function w.r.t. the ‘nuisance’ parameter K. However we shall 
presently establish their monotone dependence on the variable parameter w. 

We differentiate P(Z > Z) with respect to w and after simplification obtain 


ins aT (j + 3) a\? ay. a,\*" 
5p = K — Wla/aytas (2 -2) - $5(1 - =) | 


7 (mt nm as . =~ 2 (a,/a2)* rj + 3/2) 
(mam ss i +1) (K+ ul +0) *- jira) 


[1(mt eho 274 1)-1("4™ 441,542) ] <0 


m+ 1 


i’ Similarly by utilizing an appropriate expression for 








for K > l,w< 





Ne 
P(Z > Zo) for K > 1, w >= x a we can show that = < 0. For the case 
No + i: Ow 
K < 1 it can be shown that P(Z > Zp) is a monotone increasing function of w. 
This is also true of the dependence of the power function of Z on w. 
4.3. Unbiasedness of Z. We differentiate (4.1.1) w.r.t. 6 and A and after 


simplification obtain = 0, > 0. Thus the power function of Z has a relative 





minimum at 6 = 0, A = 0. 
The author is greatly indebted to Professors Harold Hotelling and William G. 


Madow for guidance in this research and to the referees for many useful sug- 
gestions and criticisms. 
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THE EXTREMAL QUOTIENT 


By E. J. Gumpext ano R. D. KEENEY 


New York City and Metropolitan Life Insurance Company 


Summary. The extremal quotient is defined as the ratio of the largest to the 
absolute value of the smallest observation. Its analytical properties for sym- 
metrical, continuous and unlimited distributions are obtained from a study of 
the auto-quotient defined as the ratio of two non-negative variates with identi- 
cal distributions. The relation of the two statistics is established by proving 
that, for sufficiently large samples from an initial distribution with median zero, 
the largest (or smallest) value may be assumed to be positive (or negative) 
and that the extremes are independent. It follows that the distribution and the 
probability of the extremal quotient possess certain symmetries, and that its 
median is unity. As many moments exist for the extremal quotient as moments 
and reciprocal moments exist simultaneously for the initial variate. The loga- 
rithm of the extremal quotient is symmetrically distributed. These properties 
hold for all continuous symmetrical unlimited variates which possess a mono- 
tonically increasing probability function. 

For the exponential type, the asymptotic distribution of the extremal quo- 
tient can only be expressed by an integral. In this case, no moments exist. For 
the Cauchy type, the asymptotic distribution is very simple, and the logarithm 
of the extremal quotient has the same distribution as the midrange for initial 
distributions of the exponential type. 

It is not necessary to consider asymmetrical distributions since, in this case, 
for sufficiently large samples, one of the extremes will outweigh the other, 
unless the distribution is nearly symmetrical or has rapidly varying tails. 


1. The auto-quotient and the extremal quotient. Let + and y be two inde- 
pendent non-negative continuous variates, unlimited to the right. Let fi(z) and 
fo(y) be the distributions (probability densities), and let F:(z) and F.2(y) be 
the probability functions. Then the joint distribution of the two variates is 
their product. The quotient 


(1.1) Q = 2/y 
is also non-negative and unlimited to the right. Since 


d 
t= yQ; 07% 


the joint distribution w(y, Q) of the quotient Q and the variate y is 


(1.2) wy, Q) = filyQ)foly)-y, 
523 














(1.3) nQ) = [ whlyOrfely) dy 


The quotient Q possesses a mode if (and only if) fi(~) possesses a mode. 
Assume now that the two variates x and y have the same distribution 


(1.4) fiz) = f@); fay) = FY) 


with the same parameter values. The quotient of two variates with identical 
distributions is henceforth called the auto-quotient q. . It may be realized if there 
are two independent series of observations taken from the same population and 
ordered in time. Each value from the first series is divided by the corresponding 
value from the second series. Another realization consists in dividing each value 
obtained in one series of independent observations by every other value. A 
third realization is obtained by considering two asymmetrical distributions 
fi(a) and fo(y) where x 2 0, y S 0, and 


(1.4’) fety) = fi(—2). 


The two distributions are called mutually symmetrical, and the auto-quotient 


1S 
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and the marginal distribution h(Q) of the variate Q alone becomes 


Qa = x/(—y). 


From the definition of the auto-quotient it follows that the distribution of qa 
must be the same as the distribution of its reciprocal r = 1/qa . The proof of this 
statement is simple. Under the condition (1.4), the distribution h(qa) becomes, 
from (1.3) 


(1.5) h(qa) = l yf (yqa)f(y) dy. . 


The distribution h,(r) of the reciprocal is 
1 eo 
hi(r) = 2 I yf(y/r)f(y) dy. 


If y/r is replaced by 2, the distribution of r is 
(1.6) hi(r) = h(qa). 


Thus, the distribution of the auto-quotient of a non-negative unlimited variate 
is invariant under a reciprocal transformation. 

The shape of the distribution h(q2) and the location of the mode may be ob- 
tained from the density of probability h(1/qa) at the value 1/q_ (which differs, 
of course, from the distribution f(r) of r = 1/qa). From (1.5) we obtain 


h(1/qe) = I uf(y/qa)f(y) dy. 


wae F Be US ee Ce 
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The transformation 


y/Wa = 2; dy = qa dz, 
leads to 


(1.7) h(1/qa) = qah(qa). 


This is a symmetry relation for the distribution of the auto-quotient of a non- 
negative unlimited variate. If q. is larger than unity, 


(1.8) h(1/qa) > h(qa). 


If the distribution h(q.) is continuous for all values of g., the derivative of 
equation (1.7) with respect to qa leads, for ga = 1, to 


(1.9) h’'(1) = —hA(1). 


If the distribution h(q.) possesses a unique mode, it must be less than unity. 
The moments gi are, from (1.5) 


ange IanO y=oo 
qa = [ Sa [. a’ yf (ay) fly) dy dq 


- [7 fa 


-0 y* 


(qay)"f(qay) d(qay) dy. 


The inner integral is the moment y* of order k of the initial variate y, and the 


_ 


remaining integral is its reciprocal moment y “ of order —k. Thus 
(1.10) Ga = yy = qa’. 
The moments of order k and of order —k of qq exist if the moments and the 
reciprocal moments of order k for the initial variate exist simultaneously. The 
second equation in (1.10) also follows immediately from the invariance of qa 
under a reciprocal transformation. Even if the initial distribution possesses all 
moments, the mean @, need not exist, and the same holds, of course, for the mean 
error and the higher moments. The procedure, usual in economic and meteorolog- 
ical statistics, of calculating the quotients of two series of independent posi- 
tive variables in order to test whether this ratio is constant may be misleading, 
especially if the two series happen to be samples taken from the same population. 
The theoretical mean need not exist, and the calculated mean of the observed 
quotients need not characterize the relation between the two series. 

The probability function H(Q) of the quotient Q obtained from (1.3) is 


Q pe 
H(Q) = I | yfilzy)foly) dy dz. 





Change of the order of integration leads to 


H@ = | ” faly)Fx(Qy) dy. 
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The probability function H(q.) of the auto-quotient obtained from (1.4) is 


1 
(4.11) H (qa) = I F(qay) aF. 
0 
Integration by parts leads to 
(1.12) H(qa) = 1 — a | F(y)f(qay) dy. 
0 
The boundary condition, H(0) = 0; H(*#) = 1 ean immediately be verified if 


the preceding equation is written in the form 
(1.13) H(qa) = 1 — | F (z/qa) f(z) dz. 
0 


The probability H(g.) possesses a symmetry relation which is analogous to 
(1.7). The probability at the value 1/q, is, from (1.11) 


H(/qe) = | Fy/adf(y) ay, 
0 
If we introduce the variable of integration 


Y = Mz, 


we obtain from (1.12) 


(1.14) H(qa) = 1 — H(1/qa). 


If gq is any quantile, such that H(q,) = P, its reciprocal 1; qa has the probability 
1 — P. The first quartile (decile) is the reciprocal of the third quartile, (ninth 
decile) and so on. 

lor ga = 1, equation (1.14) leads to 


(1.14’) H(1) = 3. 


The median of the auto-quotient of a positive unlimited variaie is unity. From 
(1.9) it follows that the median surpasses the mode, if a unique mode exists. 

Finally, equation (1.14) may be used to construct a symmetrical distribution. 
If a new variate 


(1.15) z= Ig qa 


with the probability function H*(z) is introduced, the symmetry relation (1.14) 
becomes 


(1.16) H*(z) = 1 — H*(—2). 


The logarithm of the auto-quotient of a positive unlimited variate has a sym- 
metrical distribution about median zero. The geometric mean of ga exists and is 
equal to unity. 
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These results hold if each observed value of a non-negative unlimited variate 
is divided by each other observed value. They do not hold for the quotients of 
two specific order statistics because, in general, the fundamental assumption of 
independence does no longer hold. However, some consequences for the quotients 
of extreme mth values may be deduced. 

Consider a symmetrical unlimited variate. Then the distribution ,.o(,,2) 
of the mth smallest value ,,.v, and the distribution ¢,,(z,,) of the mth largest value 
tm are mutually symmetrical in the sense of (1.4’). Therefore the extremal 
quotient 





(1.17) dn = = 
—max 


may be interpreted as an auto-quotient provided that 1) the probability for 
tm to be negative, and x to be positive, may be neglected; 2) the distributions 
of the mth smallest and the mth largest values are independent. Under these 
conditions the distribution, the moments, and the prebability function of the 
extremal quotient are obtained from (1.5), (1.10), and (1.11) respectively, if 
the initial distribution f(y) is replaced by the distribution of the mth largest 
values Gm(Xm). The symmetry relations (1.7) and (1.14) and their consequence, 
that the median is equal to unity, hold in particular for m = 1, i.e. for the ex- 
tremal quotient proper. 

The validity of the two conditions has now to be established. 

a) Consider a symmetrical distribution f(x) with median zero. Then the 
probability that the largest among n observations, x, , is equal to or less than a 
certain x, is 1 — F”(x). The probability P that the largest among n values is 
positive, i.e. larger than the median, is 


(1.18) P=1-2". 


If n is sufficiently large, this probability differs from unity by an amount that 
can be made as small as we please. Even for relatively small samples, say n = 20, 
the probability that the largest value will be positive is of the order 1 — 10°°. 
Thus, we expect only one largest value in a million samples of size 20 to be nega- 
tive. The same argument shows that the smallest value 2; may be expected to 
be negative. Thus the postulate 


(1.19) Ze & 0; am & 0, 


is a very weak restriction upon the sample size. If m is sufficiently small, the 
same result holds for the mth extremes. 

b) It is known [7] that the joint distribution w,(x , x,) of the extremes taken 
from an initial distribution of the exponential type converges, for sufficiently 
large samples, toward the product of the asymptotic distribution g(z,) of the 
largest value, and ,¢(x;) of the smallest value. A similar theorem will now be 
proven for a general class of continuous distributions. 
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Let »x be the mth smallest observation; let 2; be the /th largest observation 
where m and / are small compared to n, n being large. Then the joint distribution 


al aX, 2X1) is 


n! 
(1.20) Wnlmt, a) = (m — 1)\(l — 1)!(n — m — ID)! 
F(a)" (F(x) — Fmt)" "(1 = F(x)" f(x) f(r). 


Now the transformation 
(1.21) nl — F@))=£§& nF(mr) =n; OSESn, OES 


due to Cramér ({1], p. 371) is used. Then the joint distribution v,(£, 7) of the 
new variates £ and 7 becomes 


- 7” ¢ m—l _ é —s n—m—l a)" 
vn(é, n) = wm — Nd—-1l)in —m— od! @ (1 n ) (? 


where m + 1 is small compared to n. As n increases, v,(&, 7) converges to 


7 gm ert ao 
v(é, ») = (¢—,,) (7-5): 


so that in the limit £ and 7 are independent. If now the mild restriction is im- 
posed that F(a) be monotonically increasing, (1.21) defines a one to one transfor- 
mation, and therefore there must exist an inverse function uniquely defining 
mx as a function of £, and 2; as a function of y. From the limiting independence 
of £ and 7 the limiting independence of the extremes »x and 2, follows at once. 

Thus the second condition is fulfilled, and the mth extremal quotient shares 
all properties of the auto-quotient. This holds also for initial symmetrical dis- 
tributions which do not possess asymptotic distributions of the extremes. 

In the following, the two types of initial distributions of an unlimited variate 
are considered for which asymptotic distributions of the extremes exist, namely, 
the exponential and the Cauchy type. For simplicity, only the extremal quotient 
proper, designated by gq, is studied. The two asymptotic probabilities of the 
extremal quotients for these symmetrical distributions are obtained by introduc- 
ing the asymptotic distributions of the largest value into the probability func- 
tion (1.11) of the auto-quotient. 


’ 





2. Application to the exponential type. For symmetrical distributions of the 
exponential type the asymptotic distribution of the largest value is 


Hiatt, 


(2.1) g(x) = a exp [—a(a — u) — e 


’ 


where wu and a are defined in terms of the initial probability /(«) and the initial 
distribution f(a) by 


(2.2) F(u) =1—1/n; a = nf(u), 


n being the sample size. The distribution (2.1) will now be simplified by intro- 
ducing a new parameter \ defined by 


(2.3) eo =r > 0. 
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To see the meaning of A, consider Laplace’s first distribution, then the so 
called logistic [6], and the normal distributions, all of which are of the exponential 
type. In the first two cases we obtain, from (2.2), after some calculations, 


(2.4) a=1, u=lgn — lg2; a=1-—1/n, u = lg (n — 1), 


whereas for the normal distribution, we have asymptotically 


a=u= V2lg (n/+~/2r) 


and 
(2.4’) \ = n*/(2n). 


For these distributions, and interpreted in this sense, \ is of the order of the 
sample size or its square. 


From (2.3) and (2.1) the distribution g(x) and the probability function (zx) 
are 


(2.5) g(x) = ad exp [—axr — de ™]; &(z) = exp [—de ™]. 


In order to fulfill the condition (1.19), namely 6(0) = 0, the distribution g(x) 
must be truncated at x = 0. This leads to the truncated distribution g(x) and 
the truncated probability ®,(2) where 


\ exp [—axr — Ae ™”] exp [<hke “] ~ 6 


1 — e> : 1 — e 


(2.6) gu(x) = 


The asymptotic probability function H)(q) for the extremal quotient of a sym- 
metrical variate of the exponential type is now obtained from (1.11), if y, f(y), 
and F(y), are replaced by 2, ¢,(x) and ©&,(x%), respectively, and the index a is 
dropped. Consequently, from (2.6), 


H\(q) = i | ax exp [—ax — he — de] dx 


( 
c 2 
ja aay | ad exp [—axr — re ™] dz. 

_ 0 

The transformation 

e™ = 2; ae “dx = —dz 
leads to 
(2.7) I) = Gap [er ae = 
. mY Tey ‘ 2 7 7a" 


This probability of the extremal quotient for initial symmetrical distributions 
of the exponential type is not truely asymptotic since the parameter \ depends 
upon n. (See Addendum). 

Unfortunately, the expression (2.7) cannot be integrated. Therefore the prob- 
ability function has to be studied in an analytic way. For this purpose we first 
recall the general properties 

H(0)=0; A(Ql)=3; H(#) = 1, 


valid for any value of \. Furthermore, for any \, we have the symmetry rela- 
tion (1.14). These properties can be verified at once from (2.7). 








530 E. J. GUMBEL AND R. D. KEENEY 


The numerical values of H,(q) can easily be calculated for g = 4 and q = 2. 
Consider a value of \, say of the order 6. Then formula (2.7) may be written 


1 
H)(2) = | he rete?) / 
| 0 
(2.8) | 
= V/X enl4 [ oo Meth? v ie 
0 


If we introduce 
t as dt 
Vr (2 + 3) /2 VX V3 


the probability H,(2) becomes a difference of two normal probability integrals, 
ner ~ vare*fi~F 8) ~ (1 FOB) 
where F stands for the normal probability function. 


The second expression may be neglected compared to the first one for A 2 4, 
whence 


2¢ 9) = a Als [ e Pl 
(2.9) H(2) V5 re dt. 
The symmetry relation (1.14) leads to the knowledge of H,(3). Thus the three 
probabilities H,(3), H(1), and H)(2) are known. 

To see the influence of \ on H,(2), we use a method due to R. D. Gordon [4]. 
This author considers a function R, defined by 





(2.10) R, « en | ec"? aq 2 > 0, 
and proves that 
Comin Lek Bae +e re 
dx dx? dix 
It follows that 
d 
— (x 0. 
- (xR) > 
If we substitute+/\/2 for x, this inequality may be written, from (2.9) and (2.10), 
Se fi ae f” tt ) — dH)(2) 
= ( [= eM OP dt) = 2/2 > 0. 
(V3 Vi72 . . dy 


d V3 
Consequently H)(2) increases with \ whereas, from (1.14), the probability 
H)(4) decreases with \. The following table gives the probabilities H,(2) and 
H,(3), (2.9) and their differences 


(2.11) P,(2) = Hy(2) — Hy(3). 


Ee 


), 
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Asymptotic probabilities of the extremal quotient for symmetrical distributions of 
the exponential type 








Parameter Probabilities (2.9), (1.14) Probability (2.11) 
r | H,(2) | Hy) P,(2) 
8 | 84376 | 15624 68752 
18 | 91377 | .08623 | 82754 
32 | . 94661 | .05339 . 89322 
50 | . 96438 | 03562 . 92876 
72 | 97427 .02573 | 94854 


98 . 98087 | .01913 | .96174 


‘The approximative shape of H(q) is traced, for \ = 8,..., 98, amd } <q <2 
in Graph (1). Since we know from (1.16) that lg q has a symmetrical distribu- 
tion, we use a logarithmically normal probability paper where q is plotted on 
the abscissa in a logarithmic scale, and H,(q) is plotted on the ordinate in a 
normal probability scale. The probability P)(2) for any value of q to be con- 
tained in the interval } < q < 2 increases with X, i.e., with the sample size, and 
the distribution of the extremal quotient contracts. 


1) ASYMPTOTIC PROBABILITY OF THE EXTREMAL 
QUOTIENT FOR THE EXPONENTIAL TYPE 


6 8 ' 1.2 14 té LB 2.0 
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EXTREMAL QUOTIENT q 


If the initial distribution is unknown, the parameter \ has to be estimated 
from the observed extremal quotients. Equation (2.11) may be used for this 
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purpose. We calculate the observed relative frequency P,(2) of extremal quo- 
tients contained between g = 34 and g = 2, and substitute it for the probability 
P,(2). To facilitate this estimate of \, we trace P,(2) against in graph (2). 
The probability P (2) is traced on the ordinate in linear scale, and the parameter 
d is traced on the abscissa in inverse scale. Thus ) is easily estimated from the 
observed relative frequency P,(2). 


2) ESTIMATION OF THE PARAMETER JA 


“K 


PROBABILITY P, (2) 
@ 
° 


-70 





-65 


| TI 
8 9 10 15 20 30 40 50 100 200 © 
PARAMETER A 


The distribution h)(q) of the extremal quotient obtained by differentiating 
the probability function (2.7) with respect to q is 


1 
(2.12) hq) = ao I N? e Ete 29( Ip 2) dz. 


The symmetry relation (1.7) is easily verified. We now investigate the boundary 
value h,(0) and prove that 


(2.13) lim hy(q) = hy(0). 


This is not obvious, since z* becomes indeterminate if both z and q vanish. For 
the proof of (2.13), consider the integral 


1 
(2.14) I= rf e *(—lgz) dz 
0 





85 


75 


65 


ng 


ry 


‘or 
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or 
(2.15) I= (1—e)lgk —y+e°lgd — ei(—d). 


The last term, the exponential integral, is positive. The value of h,(0) is thus, 
from (2.12) 


_ reg ¥ — y — eé(—2)) 
(2.16) h,(0) = —i-o ’ 


The difference 
A = (1 — €)*(h(™ — ha(0)) 


becomes, from (2.12), (2.15) and (2.16), by the use of the mean value theorem 
and after expansion 


A = f(a) [ (eo * 22 — &) dz 


= (—1)’)’ ( 1 ) 
= f(r lndiure Lanois £2, 
Jo) 2X v! vV+1)qt+1 
where f(A) is a positive function. Since the series is absolutely convergent, the 
difference A vanishes for g = 0, and the density of probability for g = 0 is given 
by (2.16). The condition h,(0) = 0, valid for any distribution, is met provided 
that 


(2.17) A > 1.794 


By virtue of (2.4) this is a (weak) condition concerning the sample size. From 
(2.16) it follows that h,(0) does not vanish although its numerical value is very 
small. 

The existence of at least one mode follows from the fact that the distribution 
h,(q) is continuous, very small for g = 0, and vanishes for g = ». Equation 
(1.9) proves that any mode is inferior to unity. The distribution contracts for 
increasing values of the parameter. Therefore the mode approaches the median 
with increasing sample size. 

Since the distributions of the exponential type do not possess reciprocal mo- 
ments it follows from (1.10) that the distribution h,(q) does not possess moments. 
The mean extremal quotient ¢ diverges. Because the logarithmically normal 
distribution used in graph (1) as first approximation to the distribution h)(q) 
possesses all moments, the distribution h,(qg) has a much longer tail than the 
logarithmically normal one. 


3. Application to the Cauchy type. For the exponential type, the asymptotic 
distribution of the extremal quotient can only be expressed in the form of an 
integral containing a parameter \ which is a function of the sample size. For the 
Cauchy type, to be defined in the following, the asymptotic distribution will 
turn out to be very simple. 
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A distribution of a variate x 2 1 was said [5] to be of the Pareto type if 


(3.1) lim «(1 — F(x)) = A; k>0; A> 0. 


We now say that a variate is of the Cauchy type if it is unlimited, continuous, 
subject to (3.1), and symmetrical about zero. Distributions of the Pareto and 
the Cauchy type do not possess moments of an order equal to or larger than k. 
However, not all unlimited symmetrical distributions with a finite number of 
moments are of the Cauchy type. 

The simplest example of such a distribution is the Cauchy distribution itself 


1 . 
m(1 + 2°)’ 


which possesses no moments. For large absolute values of x, the usual expansion 
leads to 


(3.2) f@) = 


1 
F(x) = 3 + - are tg z, 
Tv 


F(x) =1- Zz + O(x*); F(—2z) = fm O(x7). 
TL TL 


If the factors O(x*) are neglected, the parameters A and k in (3.1) are 
(3.2’) A=nr'; k=1. 


For the Cauchy type, the asymptotic probability I(x) and distribution z(z) 
of the largest value « = 2, established by Fréchet [3], R. A. Fisher [2] and R. von 
Mises [8] are 


(3.3) I(x) = exp |-(*) |; a(x) = = (“) oe exp |- (“) |, 


where wu is defined by (2.2). 

The condition (1.19) is fulfilled for any sample size which is so large that the 
asymptotic distribution of the extremes may be used. The asymptotic prob- 
ability H:.(q) of the extremal quotient for the Cauchy type is obtained from (1.11), 
if y, f(y) and F(y) are replaced by x, w(x), and II(x), respectively, where the 
indices n and a are omitted. Consequently, from (3.3), 


"ela cinwtent 
H,.(q) == [ ca — é . dx. 


0 U\r 


From the transformation 


k ; k+1 
(“) = Zz; k (“) dx = dz, 
x u \x 


the asymptotic probability H,(q) and the asymptotic distribution h,(qg) of the 
extremal quotient become simply 
k } k—1 


oes ie a 
(3.4) Hi(q) = T+ ¢’ hi(q) a+’ q 


IV 
So 
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Evidently, the symmetry relations (1.7) and (1.14) are fulfilled for any k. The 


graphs (3) and (4) show the distribution h;(q) and the probability H,(q) for 
the most interesting cases k = 1, 2,3. From 


lg Hy 
‘hate = Ig g(l — Hil) 


it follows: For k increasing, the probability H;.(q) decreases for g < 1, and in- 
creases for g > 1. The distribution contracts with increasing values of the parameter 
k as shown in the graphs (3) and (4). The more moments that exist in the initial 
distribution, the more concentrated is the distribution of the extremal quotient. 
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3) DISTRIBUTIONS OF THE EXTREMAL 
QUOTIENT FOR THE CAUCHY TYPE 


EXTREMAL QUOTIENT 4 


The density of probability 


hy(1) = k/4 
of the median obtained from (3.4) and (1.14’) increases with k. The mode @ of 
the extremal quotient is obtained from (3.4). For i: > 1 this leads to 
k — 1 
- ok i 
(3.5) at x 


e 


<1. 
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For k S 1 no mode exists, and the distribution diminishes with g. The larger 
k, the smaller is the distance from the median to the mode, and hence, the 
smaller the asymmetry. The density of probability of the mode increases with 
k:, and the probability 


(3.6) Ay(q) = 3(1 — 1/k) 


approaches 4. The distribution (3.4) belongs to the Pareto type and has no 
moments of an order equal to or greater than /:. 

In N samples of sufficiently large size n, the largest quotient gy , defined in 
the same way as wu in equation (2.2), obtained from (3.4) 


(3.7) gy =N-1 


increases as a root of the number of samples, i.e. very quickly. The higher the 
order of the highest moments existing, the smaller will the expected largest quo- 
tient be. 

From (3.4) and the symmetry (1.14) we obtain 


(3.8) Hi(g) — Hi(1/qg) = 1 — 2/(1 + @’). 


The larger /, the larger is the percentage of the observations contained in the 
interval 1/q to q. 

For a systematic estimate of k, the transformation (1.15) is used. Formula 
(3.4) leads to the probability H*(z) and the distribution h*(z) where 


l ke 
——— ; h*(z) = ————.. 
1 oe e kz (1 a e*s)2 


The logarithm of the extremal quotient for initial distributions of the Cauchy 
type (where no moments of an order equaling or exceeding /: exist) has the 
logistic distribution, [6], as the midrange v = x, + % for distributions of the 
exponential type (where all moments exist). The logarithm of the extremal 
quotient plotted on logistic probability paper should be scattered around a 
straight line. 

The order i: of the lowest moment which diverges is obtained from the vari- 
ance o. of the distribution h*(z) which is [6] 


(3.9) H*(z) = 


(3.10) 


For the estimate of k from (3.10), o? is replaced by the estimate s? obtained from 


N 
(3.11) Bt gees Fat Sz. 


N —s 1 y=] — Pi.» 


For the Cauchy distribution itself, k = 1, and the probability and the dis- 
tribution of the extremal quotient 


Hq =d7/0+0; m@=A+ aq) 





ym 


\is- 
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are similar to the initial distribution. 
The asymptotic distribution of the extremal quotient for initial distributions 
of the Cauchy type contains one parameter only, the order of the lowest diverg- 


ing moment in the initial distribution. All other traces of the initial distribution 
have disappeared. 


4. Comparison of the extremal properties for the two types of initial distribu- 
tions. Assume that the initial distribution is symmetrical, unlimited, and pos- 
sesses an asymptotic distribution of the extremes. This is not always fulfilled. 
All moments may exist, and yet the distribution may not belong to the expo- 
nential type. No moments may exist, and yet the distribution may not belong 
to the Cauchy type. If the assumption holds, the initial distribution belongs 
either to the Cauchy, or to the exponential type. 

We take N samples of size n, and estimate the median X of the population 
from the central value m of the N central values of the samples. Let X,,, and 
Xnwv (v = 1,2, --- N) be the two extremes. If it happens for any v that 


Ain > mor Xae < m 


the sample is too small, and its size has to be increased. The central value gq of 
the observed extremal quotients gq, = (Xn, — m)/(m — X;,,,) must be near 
unity. 

If the initial distribution is of the exponential type, all moments in the popula- 
tion exist, and the midrange has the logistic distribution. If the initial distribu- 
tion is of the Cauchy type, no moments of an order greater than k exist, and the 
logarithm of the extremal quotient has the logistic distribution. The order k 
can be estimated from the variance (3.11). If all moments in the population di- 
verge, the calculation of the observed moments is futile since they do not charac- 
terize the population. 


Addendum. The referee of this paper has suggested the following method for 
obtaining an asymptotic distribution of the extremal quotient for the exponen- 
tial type. For large values of \, formula (2.7) becomes, approximately, 


1 
H\(q) = [ ie 


Let 
Az = y. 


Hy(q) = [ exp {- y E he (“)} dy. 


The further transformation 


Then 


e' = \*" q—1 =t/Igr, 





538 E. J. GUMBEL AND R. D. KEENEY 


leads to the probability H*(t) of the variate t 


r 
H*(t) - | exp{— y[l + ety" ®)) dy, 


whence asymptotically for \ — 
H*(t) -[ exp{—y(i + ¢')} dy 
0 


=1/(1 + ¢"). 


Therefore the logistic distribution holds at the same time for both initial types, 
using the transformation t = aw(g — 1) for the exponential type, and the loga- 
rithmic transformation for the Cauchy type. 


REFERENCES 


[1] H. Cramtr, Mathematical Methods of Statistics, Princeton University Press, 1946. 

[2] R. A. FisHer anv L. H. C. Tippett, ‘Limiting forms of the frequency distribution 
of the smallest and the largest member of a sample,’’ Proc. Camb. Philos. Soc., 
Vol. 24 (1928), p. 180. 

[3] M. Frécuet, Sur la loi de probabilité de l’écart maximum. Annales Soc. Polon. Math., 
Vol. 6 (1927). 

[4] R. D. Gorpon, ‘“‘Values of Mills ratio of area to boundary ordinate and of the normal 
probability integral for large values of the argument,’’ Annals of Math. Stat., 
Vol. 12 (1941), pp. 364-366. 

[5] E. J. GumBeEt, ‘‘The return period of flood flows,’’ Annals of Math. Stat., Vol. 12 (1941), 
pp. 163-190. 

[6] E. J. Gumpe., ‘‘Ranges and midranges,’’ Annals of Math. Stat., Vol. 15 (1944), pp. 
414-422. 

[7] E. J. Gumpet, ‘On the independence of the extremes in a sample,’’ Annals of Math. 
Stat., Vol. 17 (1946), pp. 78-81. 

[8] R. von Misks, ‘‘La distribution de la plus grande de n valeurs,’’ Revue Math. de l’Union 
Interbalkanique, Vol. 1 (1936). 










ON A PRELIMINARY TEST FOR POOLING MEAN SQUARES 
IN THE ANALYSIS OF VARIANCE! 


By A. E. Pauty 
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Summary. The paper describes the consequences of performing a preliminary 
F-test in the analysis of variance. The use of the 5% or 25% significance level 
for the preliminary test results in disturbances that are frequently large enough 
to lead to incorrect inferences in the final test. A more stable procedure is recom- 
mended for performing the preliminary test in which the two mean squares 
are pooled only if their ratio is less than twice the 50% point. 



















I. INTRODUCTION 


The problem discussed in this paper is one of a large class involving preliminary 
tests of significance. Studies of this type have recently been made by Bancroft 
[1] and Mosteller [2]. Bancroft dealt with a preliminary test for homogeneity 
of two variances, and a test of a regression coefficient. Mosteller dealt with the 
problem of pooling means from two normal populations having the same known 
variance. The present problem is an extension of Bancroft’s work from investiga- 
tions of the bias and variance of an estimate of variance, to investigations of the 
consequences of using that estimate in performing a further test of significance. 

The problem arises frequently in the analysis of variance. As a simple example, 
consider an experiment carried out to test the hypothesis that different labora- 
tories in a district all determine the protein content of wheat without systematic 
differences between laboratories. Three laboratories are selected at random 
and each is requested to analyze ten samples of the same wheat, five on each of 
two days. The analysis of variance would be set up in one of two ways: 












MODEL I MODEL II 
Source of variation df MS Source of variation df MS 





Between laboratories 2 23 Between laboratories 2 V3 
Between days within labs. 3 v2 ee F 3v2 + 240; 
Within days 21 », Within laboratories 27 on 










The soundest procedure is to follow Model I in which the F-ratio, v3/v , 
provides a valid though not very powerful test of the null hypothesis. But the 
investigator often doubts that this is the most effective form of analysis. His past 
experience may have shown that measurements of this kind seldom exhibit 
day-to-day variations appreciably greater than their within-day variations. 
If he is willing to accept this credible assumption, he adopts Model II because 








1 Based on a doctoral dissertation submitted to the Faculty of North Carolina State 
College of the University of North Carolina at Raleigh, N. C., in June, 1948. Published as 
Paper No. 107 of the Grain Research Laboratory, Board of Grain Commissioners, Winnipeg. 
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this increases the degrees of freedom from 2 and 3 to 2 and 27. These two models 
may conveniently be called the ‘‘never pool” and the “always pool” procedures. 

The investigator often prefers what may be called a “sometimes pool” pro- 
cedure. He starts with Model I and examines the null hypothesis that the 
variation between days is no greater than the variation within days by testing 
the F-ratio v./v, . For this test, he selects a probability level P; that may be the 
5% or some higher level. If the hypothesis of this preliminary test is not rejected, 
his judgement has been substantiated and he adopts Model II and pools the 
two mean squares. If the hypothesis is rejected, he retains Model I since he 
concludes that v. alone is the only valid estimate of error. 

The following notation is introduced: 


Degrees of freedom Mean square Expected value of mean square 


N3 v3 a3 


N2 v2 o> 


nm v o1 
where oi < 02 < 93. 

The mean squares 0; , v2, and v3 are assumed to be distributed as central 
chi-squares. This assumption is justified if the treatments (laboratories in the 
example) are selected at random from a population of treatments. But if, as is 
more frequently the case, the experimenter is interested only in specified treat- 
ments, the non-central chi-square model is the appropriate one. However, if 
the two cases are sufficiently parallel, as seems probable, conclusions drawn 
from the central model may be expected to apply to the non-central model. 

Let 62; = 03/0; and 632 = 03/03 , and let F(v , v2, P) denote the value exceeded 
by F for », and v, degrees of freedom with probability P. The rule of procedure 
for the ‘“‘sometimes pool” test may be restated as follows: 

Reject the main hypothesis that ¢3 = 03(03. = 1) if 

Vo/0, = Fy(n2, m1; Pi) and v3/v2 > F2(nz, nz ; Peo) 


or if 


Vo/0; < Fi(n2, 71; P1) and (ne + 1)v3/(nve + ny) > F3(n3, ne + 1 ; Ps). 


The “‘never pool’ procedure in which P» is used, and the “always pool’ procedure 
in which P; is used, may be considered as special cases of the “‘sometimes pool” 
procedure in which P; takes on its extreme values, 1 and 0 respectively. In 
practice, the probability levels P, and P; are usually the same; in the present 
study they are allowed to be different in case this greater flexibility should prove 
desirable. The objects of the investigation are: (a) to examine the Type I error 
under the above rule of procedure, i.e., to determine the frequency of rejecting 
the null hypothesis when it is true; and (b) to examine the behaviour of the power 
with particular reference to comparisons with the power of the ‘never pool” 
procedure. 

The remainder of this paper is divided into four sections: Part II contains a 
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general discussion of the results, conclusions and recommendations; and Part III 
illustrates the general conclusions with numerical examples. The derivation of 
distributions, proofs by elementary arguments of general qualitative results, and 
derivations of closed form expressions for n3 = 2, are given in Part IV. 




































II. GENERAL Discussion OF RESULTS, CONCLUSIONS AND RECOMMENDATIONS 


2.1. Criterion employed. In this part the principal results and recommenda- 
tions are discussed for the reader who is not interested in the mathematical 
details. To give results in a simple form is not easy, because of the many variables 
—the P’s, the @’s, and the n’s—that enter into the problem. It may be helpful 
to consider what is wrong with the “always pool” test, and then to state the 
properties which the preliminary test must have if it is to be regarded as useful 
and successful. 

If the “always pool” procedure is employed when in fact o3 is greater than 
01, i.e. 6; > 1, the denominator in the final F test tends to be too small. Thus 
the final F test gives too many significant results when its null hypothesis is 
true and if 6; is great enough, there is no bound to this hidden distortion of the 
significance level. A test which the research worker thinks is being made at the 
5% level might actually be at, say, the 47% level. 

The preliminary test represents an attempt to avoid this alarming disturbance, 
since if 6, is very large the test is expected to warn against pooling. Such a 
procedure, however, can not be expected to remove this disturbance completely, 
and it does not do so, but to be successful it should keep the true or effective 
significance level of the final F test close to the nominal level at which the 
research worker thinks he is working. 

A second requirement is that the preliminary test should increase the power in 
the final F test relative to the power of the “never pool” test. When the powers of 
the ‘‘sometimes pool” and “never pool” tests are compared, it is important to 
make the comparison af the same significance level. Suppose the preliminary test 
shifts the significance level of the final F test from the 5% to the 6% level—a 
disturbance that for some uses would not be regarded as serious. In this event the 
“sometimes pool” test (at the 6% level) would tend to be more powerful than 
the ‘never pool” test at the 5% level, because an increase in significance level 
generally results in an increase in power. But unless the “‘sometimes pool’’ test 
has more power than a “never pool” test made also at the 6% level, it has no 
real advantage over the “never pool” procedure. 









2.2. Effect of preliminary tests made at the 5% level. Probably the most 
common procedure in practice is to perform the preliminary test at the 5% level 
(ie. P; = .05) and, whether pooling is prescribed or not, to conduct the final F 
test also at the 5% level, (i.e. Po = P3; = .05). Such a procedure, except when 
6, is near one and the null hypothesis is true, results in the null hypothesis being 
rejected more frequently than if pooling is never resorted to. 

When the ratio 6; is equal to one, so that routine pooling would be valid, the 
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preliminary test is effective. The true significance level of the final F test is 
decreased slightly, but is always confined between the 5% and the 4.75% levels. 
Further, the power is always greater than that of the “never pool’ test made 
at the same significance level. 

As 62; increases from 1, the true significance level of the final F test increases 
to a maximum and then slowly decreases to 5%. Unfortunately the maximum 
need not be near to 5%: in the example presented later it is about 15%, and for a 
broad range of values of 62; the true significance level is higher than 10%. Com- 
parison with the power of the “never pool” test is also unfavorable to the “‘some- 
times pool” test. For values of 6, near 1, the “sometimes pool” test has the 
higher power, but as 62; becomes larger the advantage passes to the “never 
pool” test. 

When 62; is very large there is, as would be expected, little disturbance. The 
preliminary test seldom prescribes pooling, so that the properties of the ‘‘some- 
times pool” test are very similar to those of the ‘“‘never pool” test, although the 
“never pool” procedure yields the slightly higher power. 

The main objection to the use of the ‘‘sometimes pool” test is associated with 
the intermediate values of 6; . If over a series of experiments 62, has a moderate 
value greater than one, the “‘sometimes pool” test at the 5% levels yields more 
apparently significant results than are anticipated, and is also less powerful 
than a corresponding “never pool” test. The magnitude of these undesirable 
properties can be reduced somewhat by increasing the significance level of the 
preliminary test. 


2.3. Effect of preliminary tests made at the 25% level. Use of the 25% in- 
stead of the 5% significance level for the preliminary test reduces, in general, the 
probability of rejecting the hypothesis. This reduction, at intermediate values 
of 6, , results in a reduction of the extreme disturbances. When the ratio 42; is 
equal to one, however, the effects are not as favourable. If the hypothesis is 
true, still fewer apparently significant results occur. A final test being carried 
out at the 5% level can now have an effective significance level close to 3.75%. 
If the hypothesis is false, the test is still more powerful than a corresponding 
‘never pool” test but the gain is not as great as when a preliminary test at the 
5% level is employed. Since most experimenters desire a reasonable amount of 
protection against an error in judgement of the true value of 6; , the reduction 
in disturbances for intermediate values of 6; , resulting from the use of the 25% 
rather than the 5% level, would be judged to outweigh the disadvantages of the 
compensating factors. 


2.4. Effect of further increases in significance level. Increasing P; , the sig- 
nificance level of the preliminary test, decreases the probability of rejecting 
the hypothesis only to the point where a critical value P, is reached. Increasing 
P, beyond this value results in an increase in the probability of rejection. The 
properties of a “sometimes pool’ test in which P, is less than P, differ, in general, 
from those of a test in which P, is greater than P, . 
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Tests of the former type, which are referred to here as Class A tests, are the 
tests commonly encountered in practice. Considering, for example, a test in 
which P, = P; = .05 and 2 = 20, n2 = 4, ns = 2, we find the critical value P, 
to be .77, a figure much larger than the values .05 or .25 customarily chosen 

































3 for P,;. The major portion of the present discussion deals with Class A tests. 
1 Tests in which P, is greater than P, are referred to as Class B tests and discussion 
% of their properties is relegated to a later section. An expression for evaluating 
. P, is given in Subsection 4. 3. 

2.5. Effect of P2, P;. The probability levels (P2, P;) used for the final test 

determine the properties of the ‘‘sometimes pool’ test for extreme values of 62; . 
When 62; is equal to one, the effective significance level is less than the nominal 
e value P;, but is not less than (1 — P,)P;. The power of such a test is greater 
than the power of a corresponding “never pool” test, but less than the power of a 
e test in which one always pools and uses the P; level. For very large values of 62 
the behavior of the ‘‘sometimes pool” test approaches, in all respects, the 
h behaviour of a “never pool” test at the P» level. 
e 
e 2.6. Effect of m2, m,. The degrees of freedom m2 and 7, , associated with the 
il mean squares that are sometimes pooled, clearly affect the magnitude of the 
le disturbance. Because analytic investigation becomes complex, the following 
e remarks are based on conjectures arising out of examination of a number of 
numerical examples. 

A large value of m2 is desirable in two respects. As nz becomes larger the 
> preliminary test becomes more powerful and pooling is prescribed less often. In 
1e addition, when pooling is prescribed the pooled mean square is further weighted 
es in favour of the valid error o3 . Both factors are contributing towards a decrease 
is in bias of the error mean square with a consequent reduction in the disturbance 
is introduced into the final test. 
od The effect of n; is not as simple. As m; becomes larger the preliminary test 
Zo. again becomes more powerful and pooling is prescribed less often. But when 
ng pooling is prescribed, the pooled mean square in this case is further weighted in 
he favour of oj , which is smaller than the valid error a3. The effect on the final 
of test, which is due to a combination of these two factors, clearly depends on the 
on value of 6.,;. For intermediate values of 6. the latter factor is the predominant 
% one, and the disturbance of the effective significance level is increased as n, is 
he increased. 

2.7. Class B Test. A Class B test is one in which the probability level (Pi) 
ig- of the preliminary test is greater than a critical value P; . Pooling is prescribed 
ng only when the mean square 2; is relatively large, with the result that the error 
ng mean square tends to be too large. Accordingly, a Class B ‘sometimes pool” 
‘he test rejects the hypothesis less frequently than a ‘never pool” test at the P» level. 
al, The effective significance level of a Class B test is less than P» for all values 


of 62, . It has its lowest value when 62 is equal to one, and approaches P2 as 021 
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becomes very large. Because pooling is prescribed infrequently, little power is 
gained by the use of a Class B test rather than a “never pool” test. 


2.8. Recommendations. The principal conclusions discussed in the preceding 
subsections may be summarized as follows: A preliminary test carried out at a 
significance level as low as 5% affords little protection against errors in judge- 
ment. If oj is equal to o3(@; = 1) the reduction in errors of inference is appre- 
ciable; but if, in fact, oj is less than o3(8,; > 1), a greater number of incorrect 
inferences are made than if a preliminary test is not employed at all. The use 
of the 25% significance level for the preliminary test introduces the same dis- 
turbances but to a lesser extent. Extreme increases in the effective significance 
level at possible values of 62 are reduced and losses in power at these values are 
not as serious. The 25% level provides a reasonable amount of protection against 
an error in judgement regarding the true value of 6. However, when nz is 
large relative to n,, a smaller significance level could be employed without 
introducing any serious disturbances at the intermediate values of 6, and 
with a resulting gain in power at values of 6; near one. 

The following method of performing a preliminary test is recommended as one 
which tends to stabilize the disturbances at intermediate values of 6; while still 
taking advantage of a considerable portion of the possible gain in power at 
values of 6; near one. The procedure consists of pooling the two mean squares 
v. and v; only if their ratio is less than 2 Fy , where F'5 is the 50 per cent point 
of the F-distribution for nz and n; degrees of freedom. The use of the multiple 2 
is arbitrary and a smaller value may be used if the experimenter desires additional 
control over extreme disturbances. 

This procedure has the advantage of admitting less disturbance over a larger 
range of values of n. and n; . The customary method prescribes pooling if the null 
hypothesis (6; = 1) of the preliminary test is not rejected at some preassigned 
probability level P;. If enough observations are available to provide reliable 
values for v2 and 1; , pooling is prescribed only if o3 and oj are essentially the same. 
However, if small numbers of degrees of freedom are involved, the preliminary 
test is too weak to reject the hypothesis even if oj is appreciably less than o} , 
and pooling will be prescribed too frequently. On the other hand, the use of the 
recommended procedure has the effect of prescribing pooling only when it can 
be said, with confidence exceeding 50%, that the true value of 6; is less than 
some chosen value such as 2. | 

This can be demonstrated simply by considering a series of experiments 
in which preliminary tests are performed. When v/v; < 2F5, we make the 
statement 


(1) Ou < 2, 
and when v2/v; > 2F'5 , we make the statement 


(2) Oo > 2. 
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We have 


If statement (1) is true, 


and if statement (2) is true, 


Pr ~ > Fe} > .50. 


Thus, no matter what the true value of 6, the statements are true more 
than 50% of the time. 

Fifty per cent points of the F-distribution have been tabulated by Merrington 
and Thompson [3]. 

A simpler rule, and one which is nearly equivalent when the degrees of freedom 
involved are each greater than 6, is to pool if the ratio of the mean squares is less 
than 2, without any reference to the F-table. For smaller numbers of degrees of 


freedom, however, this simpler rule does not embody the advantages of the 
2F 0 rule, unless of course, m; and ne are equal. 


IIIT. NumMericaut ILLUSTRATIONS 


3.1. Effect of P, illustrated. An example of the influence of P; on the effective 
significance level or Type I error of a “‘sometimes pool’ test is illustrated in 
Figure 1. When P; = 0, the Type I error has its maximum value equivalent to 
the Type I error of an “‘always pool” test at the P; level. As P; increases from 
zero, the Type I error decreases until at P,; = P,(.77 in this case) it reaches its 
minimum value at a level less than P,. As P,; increases from P,, the Type I 
error increases until, at P; = 1, the Type I error is equal to P2. 

The influence of P; on the power of a “sometimes pool” test is illustrated in 
Figure 2. The gain in power, as a function of 6: is presented for three Class A 
tests. Since comparisons of power are made over tests having different Type I 
errors, the gain is expressed as the proportion actually attained of the total 
gain in power that is possible if the true value of 6; is actually known. When 
P, = P, = .77, the curve is observed to decrease monotonically to zero. However, 
for lower values of P; , the preliminary test prescribes pooling more often, and 
more power is gained when 62; is near one but less power is gained or power is 
actually lost when 4; is large. 
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The power gained or lost at various values of 62; is illustrated in Table I. 
The probability of rejecting the hypothesis for the “sometimes pool” test is 


TYPE | ERROR 


ERROR 


Ww 
a 
> 
~ 


5 10 15 
Qo, 
Fig. 1. Effect of Varying Pi. ni = 20, ne = 4, n3 = 2 and P; = P; = .05. (a) Upper 
diagram: Class A Tests. (b) Lower diagram: Class B Tests. 


GAIN IN POWER 


Fic. 2. Proportion of Possible Gain in Power Actually Attained. n; = 20, nz = 4, n3 = 2, 
P. =P; = 06. 


tabulated opposite ‘‘s.p.”, and for the ‘‘never pool” test having the same Type I 
error opposite ‘‘n.p.’’. 
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The last line of the table approaches the probabilities for a ‘never pool” test 
having a Type I error of 5%. Except for values very near (62 , 032) = (1, 1), the 
probability of rejecting the null hypothesis, using a “sometimes pool” test, is 
greater than if a “never pool” test, at the 5% level is used. In this sense, the 







TABLE I 

Comparison of Power of a ‘‘Sometimes Pool’’ (s.p.) Test and Corresponding ‘‘Never Pool’? 
(n.p.) Tests 

m = 20, n. = 4, n3 = 2; P; = P.2 = P; = .05 

















Type I Value of 632 
— of Test Error - 
- 632 = 1 1.8 2.8 4.3 TA 12.5 25 50 250 











.048 
.048 











.067 
.067 


.102 
.102 












127 
127 


845 








.146 
.146 










.482 
.528 











831 
.882 











.402 









.148 
.148 










.309 
405 


.399 
.531 


.796 


.280 .883 



















Ay 
eee 






.182 
.234 






.255 
.352 


.390 
478 


781 
.862 





751 
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10 S.p. .091 .152 220 320 .465 .621 .776 .877 .974 
n.p. .091 .191 .300 .422 .569 412 .838 .913 -982 
16 S.p. .067 .130 .209 .313 F .615 143 ‘ ’ 
n.p. .067 .149 .245 .361 .509 .662 .805 .895 .978 
100 s.p. .051 kT .200 .307 .452 .613 771 .875 .973 
n.p. .051 .118 .201 .308 .454 .615 173 .875 .973 
Below the heavy line the s.p. test is less powerful then the n.p. test. 
“power” of the ‘sometimes pool” test is greater everywhere except near 
(6 , 032) = (1, 1). 
= 2, 
3.2. Effect of P., P; illustrated. The influence of the probability levels em- 
ployed in the final phase of a ‘‘sometimes pool” test is illustrated in Figure 3. 
pe I 


The main effect is observed to be the manner in which the behaviour is con- 
strained at the extreme values of 62) . 
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Fic. 3. Class A Tests; n; = 20, ne = 4, nz = 2. 
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TYPE 1 ERROR 





_ Fig. 4. (a) Upper Diagram: Effect of Varying nz. P; = Pz = P; = .05 and n; = 20, 
ns = 2. (b) Lower Diagram: Effect of Varying n.. P; = P2 = P; = Sand nz = 4, nz = 2. 


3.3. Effect of m2, n, illustrated. The response of the Type I error to increases 
in the degrees of freedom of the preliminary test is illustrated in Figure 4. The 
maximum disturbance is observed to increase as n; increases Or as M2 decreases. 
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3.4. Class B test illustrated. The behaviour of the Type I error of some Class 
B tests is illustrated in Figure 1(b). The hypothesis is always rejected less 
frequently than if a “‘never pool’ test at the P» level is used. 










No? 4, P18 
No 212, P,=09 
No *20, Pys 06 


TYPE | ERROR 


n,=20, P,= 18 

N, =12, P)2.20 
n, 24, P, 2.26 
08 


06 
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Fig. 5. (a) Upper Diagram: Effect of Varying nz when F; = 2F 59, P2 = P; = .05 and 


n, = 20, n; = 2. (b) Lower Diagram: Effect of Varying n; when F; = 2F50, P2 = P; = .05 
and nz = 4, n3; = 2. 
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Fig. 6. Effect of Varying n1 when P; > P;. P2 = .10, P; = .05 and nz = 4, n3 = 2. 


3.5. Recommended procedure illustrated. Figure 5 illustrates the behaviour 
of the Type I error when the recommended procedure is applied to the special 
cases presented in Figure 4. When m, = 12, n. = 4, the 20% probability level is 
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prescribed and the Type I error never exceeds .09. When n; = 20, n2 = 20, the 
more liberal value of 6% is prescribed and the resulting Type I error never 
exceeds .07. The more liberal choice of P; results in a greater gain of power, 
near 62; = 1, than would have resulted if the 20% level had been used throughout. 
A small loss in power occurs when 62; is large. Should the experimenter wish to 
guard against this loss in power for a larger range of values of @.; near one, he 
may do so, at the expense of a somewhat larger disturbance in the Type I error, 
by choosing P, larger than P;. In the present example, if P2 is taken as .10 
instead of .05, Figure 6 shows that the Type I error is changed only slightly for 
values of 62; near one, but the maximum disturbance is increased. Such a test, is 
uniformly more powerful than the ‘‘never pool” test for all values of 42: for which 


the Type I error is less than .10; a much larger range of values than in the 
previous case. 


IV. DERIVATIONS AND PROOFS 


4.1. Derivation of joint frequency function. The joint frequency function of 
the v’s is given by 


N,V Ne Ve N3 V 

ee ae | ivi 2 ¥2 3 %3 
Ci Vi 1 V5 2 v3 3 exp { 4 —— <—_e ; : 

O1 G2 03 


where c; is independent of the v’s. Transform to new variables: 





Ne Ve i tee N3 V3 — N1 Vi 
= 9 = ‘ Sa 


4 = ——, 
N1 V1 Ne v2 n3 





By integrating and evaluating the constant, the joint frequency function of 
u, and ws is obtained: 





areaye or a 
(3) p= ; : BG 1( )) G 6 @ 1p) 8matnatna) 
B(dne2, 3%) B(3ns, 3(m1 + ne 21932 + O32 Uy Uy Ue 
2/2 272 
where 0; = 03/0} 5 O30 = 03/02 : 


4.2. Definition of critical region. The rule of procedure for the “sometimes 
pool” test may now be expressed in terms of the wu’s. Reject the hypothesis 
O39 = 1 if 


0 
0 Uy < Uy 
fur > U1, ? 
H > 0 or U1 U2 0 
(U2 - U2, lI + w = U3, 
i 1 
where 
0 Ne 
“N= -Fy(ne > N14 :), 
nN 
0 nN3 
Us = — -F.(n3, ne ; Po), 
Ne 
N3 


0 
U3 = 


Ne + Mm -F3(n3 » Ne - nN, ; Ps). 
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The reader will note that the w’s are ratios of sums of squares. The symbol wu; 
is associated with the preliminary test. The final test when pooling is not pre- 
scribed is associated with the symbol u., and when pooling is prescribed the 
relevant statistic is wyu2/(1 + wW). 

The critical region defined in this way is illustrated in the two dimensional 
sample space {w%, uw} of Figure 7(a). The critical regions of the ‘never pool’ 
and the “always pool’ test are readily identified in this figure. The region of a 
“never pool” test at the P2 level is designated by A + B, + C, the area above 
the line w. = u2; and the region of an “always pool” test at the P; level is 
designated by B, + B, + C + D, the area above the curve wu: = ug(1 + wm). 
The critical region of the ‘sometimes pool” test, B,; + B, + C, may be considered 
in two parts: the portion due to pooling, B, + B., and the portion due to not 
pooling, C. 


Oo 
ii =US(I¢ Ub) 
cn te ol 





U, up 








U, 
Fig. 7. Critical Region of ‘‘Sometimes Pool’’ Test. (a) Left: Class A Test: uy > t: (b) 
Right: Class B Test: uf < a. 


The probability of rejecting the null hypothesis is given by 


ut 2 oo 0 
(4) Q(G01, 932) = | / p du; duz + I. | »?P du, due, 
w v, Us 


where p is the frequency function (3), and w = u3(1 + wm)/m . 

Simple explicit expressions for these integrals cannot be obtained in general, 
but when nj = 2 they can be reduced to forms containing incomplete beta 
functions. This special case is dealt with in Subsection 4.7. 


4.3. Critical value of P,. The symbol a in Figure 1 is used to denote the 
u, coordinate of the point of intersection of the line uw. = u: and the curve 
WU, = u3(1 + u,). Accordingly, . 

(5) Uy = 0 = 0? 

Ung — U3 
a value readily determined for any given test. This relationship may be expressed 
in terms of the F’s as 
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where F; is defined by ni, = n.F,. The probability level corresponding to 
F, is denoted by P, . 

The critical value P; is the value of P, which divides the possible “‘sometimes 
pool” tests into two types having different properties. If P; is less than P,(F; > F; 
or uy > %;), the test is referred to as a Class A test. If P; is greater than P\(F, <P, 
or ui < %), the test is referred to as a Class B test. 


4.4. Lemma 1. 
Lemma 1. If 65; > 0 and O39 > 632, and if the equality applies in one of these, 
then the ratio of the frequency functions (3) 


0 py |, 6) 
plu » U2 | O21, 632) 


increases monotonically as (i) wu increases with uz fixed, or as (ii) Ue increases 
; sae ; ‘ 0 ; 

with u, fixed, or as (iii) uw increases on fixed pooling curve ule = u3(1 + uw). 
Proor. The ratio (7) is a monotonic function of 


621032 + O32 + U1 U2 





, a er . 
621632 + O32U1 + U1 U2 
It is easily shown that an expression of the form (a + bx)/(c + dx) increases 
monotonically with respect to x if a/e < b/d, and this condition holds for cases 


(i), (ii), and (iii). 
4.5. Lemma 2. 


Lemma 2. Jf area L lies above a given pooling curve, and to the right of a given 
preliminary line, if axea K lies below the same pooling curve, and to the left of the 
same preliminary line, and if 


Pr{L | 601 ; 830} > Pr{K | Bo} ) O30}, 


then 
Pr{L | 021, 032} > Pr{K | 021, 632}, 


where 02, > 02, and 032 > 632 and the equality applies in one of these. 


. ; a / ’ —-— 
Proor. For any point (w,, vw) in K and any point (wi, wv.) in L, Lemma 1 


(iii) yields 
pur, Us | O21, PD) < pur, ur | 621 ‘ 630) 
plu, Ue | 601 , 632) pu; us | 601, 632) 


nv / , / . 
where wu. = c(1 + u;)/ui, and c is a constant defined by w = c(1 + m)/m. 
. - . . ” , 
Since K is below a given pooling curve, uw. < ws and 


| 901, O52) 


/ yr / 
p(uy, Us | Ao, 432) plui, u 





/ ” , , / , / / 
plu, U2 | O01, 630) pur, Ue | O01, O39 
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Consider 


p(y , Ue | Cus, ) < pur, Us | O01, 632) 


p(uy, U2 | Aor, O32) pur, Us | O21, O32) 
where b is a constant such that the inequalities hold for all (aw, w2) in K and 
all (wi, us) in L. 
Integrating over the regions yields 


Pr{K | 021, 832} < b.Pr{K | O21 , 030} 


and 

b.Pr{L | 60 , 052} < Pr{L| 621, 632}. 
But 

Pr{K | 621 , O52} < Pr{L| O21 , O52}; 
thus 


Pr{k | O51 . 632} < Pr{L| te i G30}, 
which completes the proof. 


4.6. General Properties. 


ReEsutt 1. When 6, = 1, the Type I error of a Class A test is less than P; . 

Proor. In the notation of Fig. 7(a), the probability of falling in B,; + B, + 
C + Dis P3 when 6; = 1 and 63. = 1. The region of rejection of the ‘‘sometimes 
pool” test is smaller by D. 

Resutt 2. When 0, = 1, the Type I error of a Class A test ts greater than 
(1 — P,)P3. 

Proor. The statistics w and w2/(1 + uw) are independent when 62; = 1 and 
63: = 1. Under these conditions, the probability of falling in B, + B,, in the 
notation of Fig. 7(a), is equal to the product of two incomplete beta functions 
having the values (1 — P;) and P;. Consequently, the Type I error is greater 
than (1 — P,)P3. 

Resutt 3. The Type I error approaches P2 as 6, approaches infinity. 

Proor. The distribution becomes singular when 6, = ©. The frequency 
function approaches zero uniformly for any finite value of u and approaches 


1 ug"? 


B(3ns, 3n2) (1 + ag) 8mm) 


at uw, = ©. When 6; = «, the entire mass is concentrated on the line u,; = © 
and is distributed as a beta variable along that line. In the notation of Fig. 
7(a), Pr(B, + Be) — 0 and Pr(C) — P2. 

Resutt 4. If the Type I error of a Class A test is Qo for 0: , then for 6 > On, 
the Type I error is greater than r, where r is equal to the lesser of Qo and P2. 
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Three useful corollaries are associated with the above result: 

Resutt 4.1. If at 0, = 1, the value of the Type I error is less than P2 , this is 
its minimum value for any 62; . 

Resutt 4.2. If at 6. = 1, the Type I error is less than P2 , then as 0. increases 
from 1 the Type I error increases monotonically until Ps, is reached. 

Resutt 4.3. If for some value of 02, the Type I error is equal to or greater than 
P, , then for any lurger value of 02; , the Type I error is greater than P . 

Proor. Let the regions of Fig. 8 be denoted by Ry = A; + Bi + C; with 
similar designations for R, and R;. Let Rg = By + Bo + B+ Be + Ci + Ce. 

If r = Q, let the non-pooling line between R, and R; in Fig. 8 correspond to 
Qo for all 6. Then Pr{R,| 0, 1} = Pr{Ri| 61, 1}, whence Pr{B, + B; + 
Bs + C2| 01, 1} = Pr{A,| 61, 1}. By Lemma 2, we have for any 03; > 62, 
Pr{B. + Bs + Bs + C2| 62, 1} > Pr{Ai| 6:1, 1} and Pr{Ry| 6,1} > 
Pr{R, | 21,1} = Qo. 





YQ, cm 


Fia. 8. Critical Regions for Result 4. 


If r = P., let the non-pooling line at the lower boundary of R; in Fig. 8 
correspond to Qo for all 62: . Then in the same way Pr{ By | 02, 1} = Pr{A: + 
Ao —- A; = C; | O21 , 1} and Pr{B, | B21 , 1} Po Pr{A, Ao Se As | Gs. 1} by 
Lemma 2. Thus Pr{R,| 62, 1} > Pr{Ri + Re + As + Bs| 62, 1} and 
Pr{Rz| 621, 1} > Pr{Ri + Re| 61,1} = Po. 

Resutt 5. For a Class B test, the Type I error is less than P2 for all 62; . 

Proor. Figure 7(b) illustrates the critical region of a Class B test. We have 
Pr{A+ B+C,+ C.+ C3} = P2. But the region of rejection of the “sometimes 
pool” test is smaller, excluding A. 

Resutt 6. The Type I error of a Class B test, for 62, = 1, ts greater than 
(1 — P,)P;. ; 

Proor. Changing P; to P; removes .C, from the region of rejection in Fig. 
7(b), thus decreasing the Type I error. The modified test lies in both Class B 
and Class A, so that Result 2 applies. 

Resutt 7. For any 62, , the Type I error is a minimum for changes of P, when 
P, = P, ° 





S 


PRELIMINARY TEST 555 


Proor. For a Class A test, changing P,; to P, removes region B, of Fig. 7(a), 
thus decreasing the Type I error. For a Class B test, changing P,; to P, removes 
region C2 of Fig. 7(b), similarly decreasing the Type I error. 

Resutt 8. A Class A test, in which the Type I error is less than or equal to P2, is 
more powerful than a “never pool’ test having the same Type I error. 

Proor. In Fig. 8, let region R; = A; + B, + Ci be equal in size to Ry = 
B, - B, a B; B, -t Ci Co. Then Pr{R4| 621, 1} = Pr{R, | Ox, 1} and 
Pr{ Be + Bz; + Bs + C2 | 0,1} = Pr{A1| 01, 1}. Increasing 03. = 1 to 63: and 
applying Lemma 2 yields Pr{R, | O01 ; 630} > Pr{R, | 621 ; O30} ° 

Resutt 9. For a fixed Type I error a Class A test, carried out at given levels of 
P; and P; , 1s more powerful than a Class B test at the same levels. 

Proor. Fig. 7 and Lemma 2 apply at once. 


4.7. Closed form expressions for n; = 2. The probability of rejecting the 
hypothesis in a ‘sometimes pool” test is given by Q(6x1, 0s2) = Qi + Qe where 
Q; corresponds to the region B, and Q, to the region C of Fig. 7. 

The integrals (4) representing the probability of rejecting the null hypothesis, 
reduce, when n; = 2, to 





| us ny 

1 = 

* 0.2 | I(}n2, $1) 

(8) Q = ae f | yo\fomat my? 
1+.6| “442 
* 821 O32 \ = 





where the argument z of the incomplete beta function is defined by z = 2/(1 + x) 
where 


(9) z= 4 Ons : 
te. 
601 O32 
Under the null hypothesis 43. = 1, 
1 + us)'™ 
(10) Q; =I,(}ne, $m) i; Us - Ps, 
| 621 
since 
1 
= Ey 
Similarly 
(11) Q, = ae) 


ot 0\tn2’ 
Us 
1 a 
+ ul 
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where the argument 2’ of the incomplete beta function is defined by 2’ = 1/(1+2’) 
where 


(12) f adi 4 @he, 
\ 032} Oo1 
Under the null hypothesis 6;. = 1, 
(13) Q. = I,(3n1 , $n2)-P2, 
since 
l 


Bh dere i, 
ms (1 + wu)?” 
The incomplete beta function has been tabulated by Pearson [4]. 
The author wishes to thank Professor W. G. Cochran and Professor John W. 
Tukey for helpful advice in the preparation of this paper. 
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ESTIMATING THE MEAN AND VARIANCE OF NORMAL POPULATIONS 
FROM SINGLY TRUNCATED AND DOUBLY TRUNCATED SAMPLES! 


By A. C. ConEn, JR. 
The University of Georgia 


1. Summary. This paper is concerned with the problem of estimating the 
mean and variance of normal populations from singly and doubly truncated 
samples having known truncation points. Maximum likelihood estimating equa- 
tions are derived which, with the aid of standard tables of areas and ordinates 
of the normal frequency function, can be readily solved by simple iterative 
processes. Asymptotic variances and covariances of these estimates are ob- 
tained from the information matrices. Numerical examples are given which 
illustrate the practical application of these results. In Sections 3 to 8 inclusive, 
the following cases of doubly truncated samples are considered: I, number of 
unmeasured observations unknown; II, number of unmeasured observations in 
each ‘tail’ known; and III’, total number of unmeasured observations known, 
but not the number in each ‘tail’. In Section 9, singly truncated samples are 
treated as special cases of I and II above. 


2. Introduction. In practice, truncated samples arise with various types of 
experimental data in which recorded measurements are available over only a 
partial range of the variable. Such samples are usually classified according to 
the form of the population (complete) distribution; according to whether the 
truncation points are known or unknown; and according to whether the number 
of unmeasured (missing) observations is known or unknown. In this paper, the 
further classification of singly truncated or doubly truncated is made, accordingly 
as one or both ‘tails’ of the sample have been removed. Pearson and Lee [1, 2], 
Fisher [3], Hald [4]’, and this writer [5] studied singly truncated normal samples 
with a known truncation point when the number of unmeasured observations is 
unknown. Stevens [6], Cochran [7], and Hald [4] studied similar samples with a 
known number of unmeasured observations. Stevens [6] also considered doubly 
truncated normal samples with known truncation points when the number of 
unmeasured observations in each ‘tail’ is known. In each of these papers, equa- 
tions were derived with which maximum likelihood estimates of the population 
mean and variance can be computed from samples of the type considered. 
With the exception of [5], which uses standard tables of the normal frequency 


1 Based on papers presented before the American Mathematical Society, Durham, 
North Carolina, April 2, 1949, and before a joint meeting of the Institute of Mathematical 
Statistics and the Biometric Society, Chapel Hill, North Carolina, March 18, 1950. 

2 The problem involved in this case was recently called to the writer’s attention by 
Churchill Eisenhart. 

3 Reference [4] appeared while this paper was awaiting publication. Minor revisions have 
been made in view of Hald’s results. 
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function, practical application of the various estimating equations involves 
use of special tables which may frequently be unavailable. 


3. Case I. Number of unmeasured observations unknown. Let x» designate 
the left truncation point, x + R the right truncation point, and hence R the sam- 
ple range. Let m be the number of measured observations with values equal to 
or between the truncation points. In this case, the number of unmeasured obser- 
vations is assumed to be unknown. We translate the origin to the left terminus 
by the change of variable x = x’ — xo, and designate the left and right truncation 
points in standard units of the population (complete distribution) as ¢’ and &”, 
respectively. We can write the probability density function for this case as 


(1) f(x) = es eel 0<2<R, 
where 
(2) Ih = = [ eh ae Iv = tis [ eat 
V 2m Je 2r Jer ' 
and 
(3) p=2X— ot’. 


Thus (Jo — I’; ) is the area under the normal curve between ordinates erected at 
t’ andé” respectively. Moreover (Io — I’) = P(xp < x’ < 2) + R). The likelihood 
function for such a sample is 


Pall 
(4) P(a, 22, ae a. = (Z a I'))ov/ 20 


Since R is the truncated range, and since ~’ and £” are in standard units, 
we have 


(5) f= t+ R/o. 


It should be understood that é’ is considered throughout this paper, as the 
independent parameter of location. The mean, y, cf. (3), is a linear function of ¢’. 

In the derivations which follow, we employ the Fisher J, functions, where 
Iy(é) is defined by (2) and 


no "9 
) —$D(E’+24/0)? 
€ “1 . 


(6) In(@) = [ ” Ty-x(t) dt, 


and hence 
dl, 


— —In-1. 


di 
These functions satisfy the recurrence formula 


(7) (n + 1)In41 + EI, — | = 0, 
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I,(€) is ordinarily abbreviated to J, in this paper. Where no confusion seems 
likely to occur, similar abbreviations are used for other functions of £. 

We now obtain certain relations for use in subsequent derivations. Equations 
(2), (5), and (6) enable us to write 





als ' aly ” mn lo » OF” 
—>S i ic=>-—— F = , —_ = —-L[_ = —_ = — nae 
(8) oe’ 1 g(é h oe’ I 1 g(é ); Oc a 0a ? 
where ¢(é) is the ordinate of the normal frequency curve; i.e., g(§) = V2 e Pr 
T 


Ordinarily we abbreviate ¢(?’) to ¢’ and g(t’) to g . On differentiating (5) 
we have 


ae” RR 
©) ia 
and hence from (8) 

ay wR 

Oo ? oe 


Taking logarithms of (4), differentiating with the aid of (8) and (9), and 
equating to zero, we obtain the maximum likelihood estimating equations 


aL nly —o')  S/,, , Xi 





ag! 
(10) aL noe” R on im Li 
aa - ( — ws: += 15> Zi ’ + — == (), 
Oo Ibn-lojo« o o 4 o 
If we define 
alt ao gall 
(11) Zi = I, a - ‘ Z, = l, “ - ‘ 


and substitute these values in (10), the estimating equations become 
o[Z, — Z. — #’] —» = 0, 
oll — &(Z, — Z, — t') — ZR/o] — » = 0, 
where v; and v are the first and second sample moments referred to the left 


no 


; . k 
terminus; i.e., », = 7. xi/No . 
1 


(12) 


To obtain the required estimates ¢ and ?’, it is necessary to solve the two 
equations of (12) simultaneously. As illustrated in Section 7, this can be accom- 
plished without too much difficulty with the aid of the normal curve tables by 
using a modified Newton-Raphson method for solving two equations in two 
unknowns. This method is described in greater detail by Whittaker and Robinson 
[8]. Note that Z, and Z., ef. (11), involve only the normal curve ordinates 
gy’ and ¢” and the areas J¢ and J,’ . Consequently they can be evaluated for any 
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desired values of £’ and o from standard tables of the normal frequency function, 
To determine 4, substitute ¢ and ’ in (3). 

Throught this paper, we designate the maximum likelihood estimates as 
i, &¢ and ~’ respectively, whereas corresponding population parameters are 
designated as u, o, and £’. 


4, Case II. Number of unmeasured observations in each ‘tail’ known. Let 
the truncation points, the origin of reference, and the number of measured 
observations be designated as for Case I. If we let n; and nz be the number of 
unmeasured observations in the left and right ‘tails’ respectively, the likelihood 
function for a sample of this type is 


iw: - “ 
(13) Pla, 22, --- »Tarengtn,) = K(1 — I,)"™*- - Jz) e fF tele). (75') " 





where K is a constant. 
We take the logarithms of (13), differentiate with the help of (8) and (9), and 
equate to zero to obtain the maximum likelihood estimating equations 


OL g gy" </, =) 
ae’ my 7 7% -L(¥v+*) = 0, 


Is 
(14) al Y 12 
4 n ‘ Zz 
~ =n: oe a = + 2 3 {rs (¢ + )} = (), 
Oo I o es Co 
Let 
n F ,e! 
(15) Y m © = Ne 


- 
? 
7lo Io 


Jon No (1 = Io) , 
and (14) can be written as 
o[V, = Yo = £| —" = 0, 


(16) ‘ . . 
o({i — #(Y%1 — Yo — #) — YR/o| — » = 0, 

where »; and v2 are again the first and second sample moments referred to the 
left terminus. The estimating equations (16) correspond to equations (12) 
given for Case I, and the manner of solution is the same for both cases. Y; and 
Y2 for » given sample are functions of ¢’ and o only. They can be evaluated for 
any desired values of these variables from ordinary normal curve tables. As in 
Case I, the mean is estimated from (3). 


5. Case III. Total number of unmeasured observations known, but not the 
number in each tail. Again, let the truncation points, the origin of reference, 
and the number of measured observations be designated as in the two previous 
cases. Let N be the total sample size and hence N — ny the combined number of 


AS 
re 


YY Foe CD 


ae ae a 


TS 
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unmeasured observations in both tails. In the notation of Case II, N — m = 
m + ne. The likelihood function for a sample of this type is 


no no 
) —$ > (§’+2i/0)2 
é 1 . 





(17) Pla,22,°++,2n) =K(l— [9+ ray ( 


o W/ 2 
Taking logarithms of (17), differentiating with the assistance of (8) and (9) and 
equating to zero, we obtain the maximum likelihood estimating equations 


aL ” no 


ar P = e eee as , Li oan 
.~""™ (i - eT) > (3 . a = 


aL oe” \R_ m, 1gsf /,, a 
i ae are Be : +d in(e + =) = 0. 
In this instance, let 


Oo 
_(N—% a saa. _(N—%™ ducal cee 
(19) Q: -( No iF «Lon a. = ( =) —-Ih+Iy’ 


and (18) can be written as 


(18) 


I 
a 

| 
3 
Q. 





o(Qi — Q. — #] — 1 = 0, 
o [1 — #(Q, — Q — #’) — Q:R/o] — » = 0. 


It will be recognized that equations (20) correspond to (12) and (16) for Cases 
I and II respectively. Since the manner of solving the estimating equations is 
identical in all three cases, it will not be discussed further here. For any given 
sample, Q, and Q: are functions of é’ and o only, and they can be evaluated for 
any desired values of these arguments from standard normal curve tables. In 
this case also, the mean is estimated from equation (3). 


(20) 


6. First approximations. 
Case I. In this case, the following relations will usually provide satisfactory 
first approximations for estimating o and ¢’: 


(21) 1=8:, & = —n/sz, 


where s2 is the sample variance; i.e., s; = (v2 — v}). It should be remarked 
that the only penalty involved in beginning with a poor first approximation is 
to increase slightly the number of steps necessary before arriving at a satisfactory 
final approximation by the method of Section 7. 

Case 11. Since n; and nz are known in this case, it is more expedient to read 
first approximations to ¢’ and ~”’ directly from standard tables of normal curve 
areas where we set 


ny , 


1 ’ —t2/2 
(22) m 4 No fe Ne = 1 —Io = / 22 [ é dt, 
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Ne ” 1 P aaall 
23 ————— =] = —— | OF ae 
aad m+tnotnm 9 Vin de” - 
With & and é”’ determined from (22) and (23), we obtain a first approximation 
for estimating o, from equation (5), which we now write as 


(24) a = R/(&’ — £)). 


CasE 111. For a first approximation in this case, it will usually be satisfactory, 
in the absence of contrary information, to assume that the unmeasured observa- 
tions are divided equally between the two tails, and then proceed as in Case IT. 


7. Numerical examples. As previously mentioned, a modified Newton- 
Raphson method for solving two equations in two unknowns is satisfactory in 
each of the three cases considered, for solving the estimating equations to obtain 
é and é’ in practical applications. A random sample from a normal population 
with » = 0, and o = 1, selected from Mahalanobis’s tables [9] will serve to 
illustrate the solution in each case. 

Case I. For the sample selected, nm) = 32; »,= 1.244625; » = 2.105275; 
zo = —1.000000; and R = 2.750000. The estimating equations to be solved 
simultaneously for é’ and ¢ are thus 


o[Z, — Z, — #’] — 1.244625 = 0, 
o [1 — £'(Z, — Z. — &’) — 2.750000 Z./c] — 2.105275 = 0. 


For first approximations, we employ (21) to obtain; 0; = s, = 0.75; and & = 
—1.244625/0.75 = -—1.66. Beginning with these approximations, we subse- 
quently obtain the results displayed in Table 1. 


TABLE 1 
Solution of estimating equations in Case I 


o t’ from », é’ from », Difference 

1.536313 —0.5389 —0.5387 —0.0002 

1.527778 —0.5455 —0.5460 +0.0005 
Interpolating in this table, we obtain 6 = 1.534 and ¢’ = —0.541. On substituting 
these values in (3) we obtain 4 = —0.170. Even though the first approximations 


in this instance proved to be considerably in error, no appreciable increase was 
experienced in the number of steps necessary to arrive at the final values given. 

CasE 11. Solution of estimating equations (16) for this case can also be illus- 
trated with the same sample which was used in Case I. In this instance, however, 





ion 


e- 


TRUNCATED SAMPLES 563 


we have the additional information; m1= 7 and nm. = 1. The equations to be 
solved are: 
o[Y1 — Yo — #’] — 1.244625 = 0, 


o[1 — &(Y¥1 — Ys — #) — 2.750000 Y2/o] — 2.105275 = 0. 


From (22), (23) and (24) we obtain the first approximations: & = —0.935; 
t;/ = 1.960; and hence o,= 0.950. Beginning with these values, we proceed as 
in Case I, and after several trials obtain the results displayed in Table 2. 




















TABLE 2 
Solution of estimating equations in Case II 
o ¢’ from v, €’ from », Difference 
1.041667 —0.9381 —0.9360 —0.0021 
1.000000 —0.9820 —1.0094 +0.0274 
Interpolating, we have ¢ = 1.039 and ?’ = —0.941. From (3) we then obtain. 
p= —0.022. 


CasE 111. Again we use the same sample that was employed to illustrate 
Cases I and II. In this instance, however, we assume that the only information 
available about the unmeasured observations is that their total number is 8. 
In the notation of Section 5, wehaveN = 40, m = 32, and hence N — m = 8. 
The estimating equations in this situation are 


a[Q: — Qo — é’] — 1.244625 = 0, 

o[l — £(Q: — Q: — &’) — 2.750000 Q2/e] — 2.105275 = 0. 
Under the assumption that 4 unmeasured observations are in each ‘tail’, equa- 
tions (22), (23) and (24) give first approximations: & = —1.28; &’ = 1.28; 


and hence o; = 1.074. Starting with these values and proceeding as in the two 
previous cases, we obtain the results displayed in Table 3. 











TABLE 3 
Solution of estimating equations in Case III 
o £’ from », &’ from », Difference 
1.000000 —1.0794 —1.2091 -+0.1297 
1.100000 —1.0118 —0.9739 —0.0379 
By interpolation, we have ¢ = 1.077 and “’ = —1.027. From equation (3), 


we then compute 4 = 0.106. 


8. Precision of estimates. To determine asymptotic variances of ¢ and ’, we 
construct the variance-covariance matrices. This requires that we obtain the 
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second partial derivatives of logarithms of the likelihood function in each of 
the three cases considered. Results stated in (8) and (9) are involved in these 
derivatives. 

CasgE 1. The second partial derivatives in this case are 


~] 


(25) “hed — mo filé’, é””), =. — = fal ’ é””), : - = = fs(é’, = "Sab ; 
e 0&0 do* 








where 


fi?’ #") = -[1 + #Z, — #’Z. — (4 — Z2)'), 
byet sty .. (R ee _ gt a — 
(26) folé', E") = F Z2|(Z1 Z2) £") + [Z; Ze ei}, 
) : 
fies 8”) = {() tate + 0°) - [2-8 - m— 8) - 22. 


Subsequently we obtain 


= P —fi Ar = —fs eset ow fe 
(27) Vié) = | al. ve) Ses it ree = 7s 


Case 11. In this case the second partial derivatives are 








aL , ” 0 i. no ” aL ilo , a 
9 — = g(t ¢& 
(28) og"? no gilé > g be ot’ at! ao , mE; é ); dc? o g3(é ’s ) ? 
where 

g(t’, ¢’’) ee t 4. vy, fae ey, + ~ ¥3 : +: No * yi, 

I 

/ ” No ” 7 ba , | 
(29) molt’, &”) = 4 Vx Yo — 8" | + [¥: — Y2 — ei. 

g(t’, ¢’’) si (*) ve" i No r:) a [2 ae £’(Y, ii Y2 oan é’) alts YeR/o)\. 

\\o Tle J 


Finally we can write 


—H . a ] “is Je 
30 V =” : V c = | ” |, Tce? = ; =, 
(30) "e las - ale ) nolL M93 — 92 . V 195 


Case 111. This time, the second partial derivatives are 





a2 5 
OL no L no 4 


aL ” 0 me t et 
i dco halt’, &), G2 halt’, &”), 


(31) a _ no hy(é’ ’ fh 


¢/2 
Ss 
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ng’, €”) = -[148@-< - ov*], 
nese) = Baal (ys seemed 
wi.er (oe = 230) 
-|2-2@-a-0-aF]}. 
Accordingly we obtain 
(33) V(é) = ea. vi) - a. hee ee. 


Note that variances of the estimates for each case considered, can be computed 
for given values of ¢’ and o from standard normal tables of areas and ordinates. 


where 








(32) 


= 





9. Singly truncated samples. If only the left ‘tail’ is missing from the samples 
thus far considered, then t” = ~, nm. = 0,9” = 0, In = 0, and hence Z;, Yo, 
and Q. each equal zero. Upon substituting these values in (12), (16), and (20) 
respectively, estimating equations applicable to singly truncated samples are 
obtained as special cases of the estimating equations for doubly truncated 
samples. Of course Cases II and III become identical when samples are singly 
truncated. When Y. = Q. = 0, then Y; = Q,, ef. (15) and (19). 

CasEI. With Z. = 0, the estimating equations (12) become 

g[Z, — £"] = wie 
ofl — &(Z4— #)) = 


Eliminating o between these two equations we have 


;  —— l 
(35) See <r}, 


which is recognized as the Pearson-Lee-Fisher equation in a form which was 
previously given by the author [5]. 
CasE 11. With Y. = 0, the estimating equations (16) become 


o [Yi —_ | = 
o[1 — (Yi: — #)] = 


Eliminating o between the above equations, we obtain 


v2 : l 
- fo ycelnae-*): 


(34) 








(36) 
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which is in a form completely analogous to (35). Furthermore, this equation 


can be solved for £’ in the same manner as (35), ef. [5]. Since o can be eliminated 
between estimating equations in singly truncated cases, but not in doubly 


2 
Vie )=Z— w (n/N known) 


2 
Vie ye w' (n/N unknown) 


WEIGHTING FACTORS 





Fig. 1. Weighting factors for use in determining the variance of ¢. 


truncated cases, the numerical computations are much simpler and less laborious 
for singly truncated samples. 
If the right rather than the left tail is missing from singly truncated samples, 


TRUNCATED SAMPLES 567 


l applicable estimating equations can be obtained from (12) and (16) by translating 
1 the origin to the terminus on the right and setting Z; and Y; equal to zero 
y rather than Z, and Y2. 


100 


10 
7 a 
; V(E}=Lw (n/N known) 
. ae 
S 5 v(e)= tw (n/N unknown) 
oO 
e 4 
2 vig)= w" (estimated from n/N only) 
u 3 
o 
z 
- 2 
=x 
4 
wi 
> 
0.9 
0.8 
0.7 





-3 -2 el 0 ' 2 3 


E 


Fic. 2. Weighting factors for use in determining the variances of £’ and &*. 
ious The variance formulas (25) and (28) likewise assume more simple forms with 
singly truncated samples. Substitute Z. = 0 in (25) and the variance formulas 


oles, applicable with singly truncated samples when the number of unmeasured 
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observations is unknown, become identical in form with those previously given 
by the writer in [5]. When the number of unmeasured observations in a singly 
truncated sample is known, the applicable variance formulas (28), on setting 
Y, = 0, become 


(38) V(@) =<W@) and Ve) == w(t, 


where W and w may be regarded as weighting functions defined by 


1 + Yiu(¥Y, No/N1 + ®) 


69) WE) = BI eM — Oil + Yu(Yime/m + 1 — a — eT 


and 


de 3 ~— #7, — #) 
40) ¥@) = BW, — HI + Kim /m +E — 
Similarly, the correlation between sampling errors of ¢ and £’ in this case becomes 
sop: idemieaiannaciaaeacacmatl NT ia 

V[2— 2(%1 — eI + ¥i(¥ino/m + €))" 

A comparison of the variances (38), with those applicable when the number of 
unmeasured observations is unknown, serves to indicate the extent to which 
information contained in a singly truncated sample is increased by adding 
knowledge of the number of unmeasured observations. To facilitate such com- 
parisons, W, w, and corresponding functions W’ and w’ applicable when the 
number of unmeasured observations is unknown, are displayed graphically in 
Figures 1 and 2. In computing the plotted values of W and w, the ratio n/N 
in (39) and (40) was replaced by Jy . This ratio is, of course, an estimate of I) , 
and for n and N sufficiently large, the substitution is amply justified. Equations 
for W’ and w’ can be found in [5]. For further comparisons, a graph of w’’ ap- 
plicable in determining the variance V (é*), where £* is estimated from n/N alone 
is also included in Figure 2. This latter function is defined as 


(41) 1s,é7 


Io(1 — Io) 
42 w’ P) i eee 2 
(42) (¢*) A 
It follows from the well known formula for the variance of &*: 
(43) Vee) = 5 ee — OO LO — 
$2 n ge 


An examination of Figures 1 and 2 discloses that except when the omitted 
portion of the distribution is small (t’ < —3), the variances of the estimates of 
o and ~’ based on singly truncated normal samples are substantially less when 
the number of unmeasured observations is known than when this information 
is lacking. 
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THE ASYMPTOTIC PROPERTIES OF ESTIMATES OF THE 
PARAMETERS OF A SINGLE EQUATION IN A COMPLETE 
SYSTEM OF STOCHASTIC EQUATIONS':? 


By T. W. ANDERSON’? AND HERMAN RUBIN‘ 
Columbia University and Institute for Advanced Study 


1. Summary. In a previous paper [2] the authors have given a method for 
estimating the coefficients of a single equation in a complete system of linear 
stochastic equations. In the present paper the consistency of the estimates and 
the asymptotic distributions of the estimates and the test criteria are studied 
under conditions more general than those used in the derivation of these estimates 
and criteria. The point estimates, which can be obtained as maximum likelihood 
estimates under certain assumptions including that of normality of disturbances, 
are consistent even if the disturbances are not normally distributed and (a) some 
predetermined variables are neglected (Theorem 1) or (b) the single equation is 
in a non-linear system with certain properties (Theorem 2). 

Under certain general conditions (normality of the disturbances not being 
required) the estimates are asymptotically normally distributed (Theorems 3 
and 4). The asymptotic covariance matrix is given for several cases. The criteria 
derived in [2] for testing the hypothesis of over-identification have, asymp- 
totically, x*-distributions (Theorem 5). The exact confidence regions developed 
in [2] for the case that all predetermined variables are exogenous (that is, that 
the difference equations are of zero order) are shown to be consistent and to hold 
asymptotically even when this assumption is not true (Theorem 6). 

2. Introduction. The complete system of linear stochastic equations con- 
sidered by the authors in [2] was written 


(2.1) By + Vye2t = G. 


where y; is a row vector of G jointly dependent variables at “time” ¢, z; is a row 
vector of K variables predetermined at ¢, and e; is a row vector of “disturbances,” 
and B,, and I,, are matrices. If B,, is non-singular the distribution of ¢, induces 
the distribution of y; given 2; . 

One component equation of (2.1) was given special treatment. Let 8 be 


1 This paper will be included in Cowles Commission Papers, New Series, No. 36. 

2 The results of this paper were presented to meetings of the Institute of Mathematical 
Statistics at Washington, D. C., April 12, 1946 (Washington Chapter) and at Ithaca, New 
York, August 23, 1946. Most of the research was done at the Cowles Commission for Re- 
search in Economics; the authors are indebted to the members of the Cowles Commission 
staff for many helpful discussions. 

3 Fellow of the John Simon Guggenheim Memorial Foundation; Research Consultant 
of the Cowles Commission for Research in Economics. 

4 National Research Fellow; Research Consultant of the Cowles Commission for Re- 
search in Economics. 
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composed of the coefficients of the coordinates of y; which are not assumed 
zero in the specified equation, and let x; be composed of the corresponding 
components of y; ; similarly let y be composed of the coefficients of the coordinates 
of z; which are not assumed zero, and u,; the corresponding components of 2; ; 
and let ¢; be the component of «; associated with the specified equation. Then 
the single equation is 


(2.2) Bxe + Yue = be. 
Suppose we have a set of observations z;, z,,¢ = 1, --- , T. For sets of any 
two vectors a; and b,, let the second-order moment matrix be 
ic, 
(2.3) Me = = 2 ab. 
T t=1 


Let s, be some linear transform of v; , the set of coordinates of 2; not contained in 
us, chosen so M,, = 0. Defining 


(2.4) Wee = Miz — Ma Mi Mes, 


and assuming e; normally distributed with mean 0, covariance matrix 2, and 


independently of e,-(t # ¢’), we find 8, the maximum likelihood estimate of 8, 
to be proportional to a vector defined by 


(2.5) (Mz:MzeMiz — vW22)b’ = 0, 
taking v as the smallest root of 

(2.6) | MaeMieMee — vWez| = 0. 
The vector is normalized by 

(2.7) 66,.8’ = 1, 


where ®,, may be a function of the estimates of other parameters. The estimate 
of y is? = —$M..M7}, [2; Theorem 1]. These estimates were derived under the 
following explicit Assumptions A, B, C, and D: 

AssuMPTION A. The selected structural equation (2.2) is one equation of a complete 
linear system of stochastic equations. It is identified by the fact that if H is the 
number of coordinates in x; , there are at least H — 1 coordinates in v; , the vector of 
predetermined variables in the system, but missing in (2.2). 

ASSUMPTION B. At time t all of the coordinates of 2; = (ut, v2) are given. 

AssumpTIon C. The coordinates of 2; are given functions of exogenous variables 
and of coordinates of Yt+, Yi-2, °° + . If coordinates of yo , y-1, «++ are involved in 
2, they will be considered as given numbers. The moment matrix M,, is non-singular 
with probability one. 

AssumPTION D. The disturbance vectors «, are distributed serially independently 
and normally with mean zero and covariance matrix Dzz. 

Under these assumptions it is found that (1 + v)~” is the likelihood ratio 
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criterion for testing the hypothesis that the number of components of z; assumed 
to have zero coefficients is so great. 

If there are no lagged endogenous variables in z;, we can find confidence 
regions for 8 and for 8 and y simultaneously as well as an approximate test for 
the above hypothesis. The assumptions used for these results are A, B, and 

AssumPTION E. All the coordinates of z; = (uz, V4) are exogenous. The moment 
matrix M,, is non-singular. The disturbances of the selected equation are distributed 
independently and normally with mean zero and variance o’. 

Assumptions A and B are used in this paper and a number in addition, 
which will be lettered similarly. It is to be emphasized that the various assump- 
tions are used alternatively, never all at once; in fact many assumptions are 
mutually exclusive. 

3. Consistency of the estimates. The estimates 8 and 4 are consistent not 
only in the case for which they are maximum likelihood estimates, but also in 
cases in which the disturbances are not normally or even identically distributed. 
Moreover, for consistency of the estimates it is not necessary that the investigator 
know all of the components of v; or use them. Another direction in which the 
assumptions may be relaxed is to permit the other equations in the system to be 
non-linear. 

3.1. The linear case. This case is characterized by Assumption A. We need 
also to assume: 

ASSUMPTION F. M,, converges to a fixed non-singular limit R in probability. 

Let u; consist of the part of z; that enters the selected structural equation (22). 
The remainder of the components of z, are divided into two groups as to whether 
they are known or not. Let c, be a linear transform of the known components 
not entering the specified equation such that 
(3.1) plim M,. = 0, 


| nd) 


and let r; be a linear transform of the components of z; not known such that 


(3.2) plim Mu, = 0, 
t—2 

(3.3) plim M, = 0. 
to 


The relevant part of the “reduced form,” obtained from (2.1) by multiplication 
by B;, is 
(3.4) at = Tut + Weece + Wert + 8. 


The matrix (Il.-I.z,) is Iz, (defined in [2]) multiplied on the right by a non- 
singular matrix; hence, 6II,- = 0, and similarly SII, = y. We shall find it 
convenient to assume 

ASSUMPTION G. IIz- has rank H — 1. 
This means that for T sufficiently large the probability is arbitrarily near 1 
that (2.2) is identified. 


\- 
it 
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However, these conditions still do not insure consistency. We need the asymp- 
totic analogue of lack of correlation: 
ASSUMPTION H. 


plim | ly 5421 = 0. 
i) t=1 
We do not need to require that the covariance matrices of 5; are the same or 
even that they exist. We shall make an assumption about 


Mu. Muc\ (Mus 
(3.5) Wee = Maz — (Mau M 2) ; 
M ex Me Mz 


AssumPTION I. The ratio of the largest to the smallest characteristic roots of Wee 
is bounded in probability. 
This means that for a suitable constant K 


UW.) ) = 
(3.6) lim P (: W.) >K)=0, 


where P(E£) denotes the probability of event # and s(A) and /(A) are the smallest 
and largest roots of the matrix A, respectively. 

Assumptions F and H imply that Pi, — Th. and Pre — Tee in ee 
where P,,, = Mz,M7', and P,, is the part of 


Mis M.u\~* 
(3.7) (Meu Me) 
Mau Mee 


corresponding to the vector® c,. The first assertion follows because Mz,M c= 
(TeuMuu + WeeMoc + WerMr + Ms.)Mii and M,. 0, Mr — 0, andMs, — 0 
in probability by (3.1), (3.3) and Assumption H; the second assertion follows 
similarly. Since matrix multiplication is continuous, and the characteristic roots 
of a matrix are continuous functions of the matrix,® 


(3.8) plim s[PzcM.sP2] = 0, 
T-0 





where My, = (Mee — M.uM7zi.M..). This follows from the well-known theorem 
(a proof of which is given in [4]) that if a random vector X,7 converges sto- 
chastically to X, then f(X r) converges stochastically to f(X) if f(y) is continuous 
at X. 

We shall find the following lemmas convenient. The proofs are simple and 
have been given in [1]. 

Lemma 1. Let B be positive definite, A positive semi-definite. Then the smallest 
root v of | A — xB | = 0 is less than or equal to s(A)/s(B). 


5 See Section 4 of [2]. 


6 Because of the assertion above and Assumptions F and G only one characteristic root 
of the matrix approaches zero in probability. 
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LemMA 2. Each element of a positive definite matrix is less in absolute value 
than the largest characteristic root. 
Let v be the smallest root of 


(3.9) | PecMssP2e — vWez| = 0. 
Then plim vWi2 = 0. This statement follows from (3.8) and Lemmas 1 and 2. 
T—0 
Since 0 is a simple characteristic root of II, plim M wlIz¢ , it follows from (3.9) 
T—0o 


and the consistency of P,, and P,, that 8 approaches 8 apart from normalization. 
The following theorem results directly: 


THEOREM 1. Under Assumptions A, F, G, H, and I, and if plim 64,28’ = 1, 
T—0 


(3.10) plim ~ = 8, 
To 

(3.11) plim 7 = 7, 
T-90 


where B and ¥ are calculated as if r, = 0 and as if the remainder of A, B, C, and D 
were satisfied.’ 

3.2. The non-linear case. In this section we apply the estimates obtained in [2] 
to an equation of a complete system in which the remaining equations may be 
non-linear. We replace Assumption A by the following assumption: 

AssuMPTION J. The selected structural equation (2.2) is one equation of a complete 
system of stochastic equations: 


(3.11) Fi(ye, 2) = €t (¢ =1,---,G). 
Let us solve the complete system (3.11) for the components of y;. We obtain 

(3.12) Yt3 = hj(2e, €). 

Let u, be the subvector of z; occurring in the selected structural equation. 

Let c; be a vector function of z; such that plim M.. = 0. We may write (3.12) 

for those y’s occurring in the selected structural equation as 

(3.13) xv; = Tew: + Uece: + 9'(2e, €1), 


where the components of ¢(z:, €:) are the residuals from the formal limiting 
regression of x; on u; and c,. The proof of Theorem 1 can be used to prove the 
following: 

THEOREM 2. If Assumptions F, G, H, I, and J are satisfied with z, replaced by 


(ue, Cz) and 6, replaced by oz: , €:), and r, = 0, and zf plim 8%,,8' = 1, then 
T-0 


(3.14) plim @ = 8, 
To 

(3.15) plim 47 = y. 
a T—0 


7? This follows from the above statements because 6 and ¥ are (vector-valued) rational 
functions of M,, , Pz, , Wtz and #,; which approach limits in probability. 


1e 


by 


ial 
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4. The asymptotic distribution of the estimates. 

4.1. The asymptotic distribution of Pz, and P,,. To obtain the asymptotic 
distribution of the estimates we need stronger assumptions. Throughout Sections 
4.1 and 4.2 we use Assumptions A, B, F, H, I, and the following: 

AssuMPTION K. The exogenous variables are bounded; the vector of disturbances 
of the complete system has mean zero, and is serially independent; for some » > 0 
and some M, &(| 5: |***) < M; the coordinates of z, may be linear combinations of 
lagged endogenous variables. If the endogenous part of a coordinate is 


0 G 


Za Za JriYt—r,i, 


T=1 i= 


then 
oo G 


De algal << @ 


t=1 i=l] 


and 
} G 


Zz Z ri Yt—r,i 


t=t i= 


ts bounded. 
AssuMPTION L. The matrix ®,, is known and constant. 
AssumPTION M. For eachi,j,k,l,1<i,j7 < Hl <k,l< K, 
? 


; 1 7 
lim = & (605625 2 21) = Kijkl 
T—2 Zr t=1 


exists. 

Let the components of M,,, M,., Mz be arranged as a vector m(T) with 
mean value u(7'). It has been shown [3] that »/7'(m(T) — u(T)) is asymptotically 
distributed according to N(O, 2), the normal distribution with mean 0 and 
covariance matrix = composed of elements 


oj = lim &(T[m(T) — u(T)] (m(T) — »(T))). 


In conjunction with this result we make repeated use of a special case of Theorem 
6 of [4]: 


Suppose »~/T(x;7 — £7) (j = 1, --: , n) have the joint asymptotic distribution 


N(0, VY) with &;7 being functions of T such that lim £;7 = &; . Let fir(zi, ++ , 2n) 
T—2 
be random Borel-measurable functions of n real variables such that alee = axjr(z) 
i 


exists with probability one for T sufficiently large and z in a fixed neighbor- 
hood of £, and suppose that there exist numbers a; such that for any « > 0, 


and x > 0, P( sup | ox ;r(z) — ax; | > €) approaches zero. Then if 
(2) (2—-Er)’ S$ (0/7) 


Yer = Ser(Qir 5 *°*,Xnr) and ner = ferlfir , +++ , Enr), the random variables 
V/T (yer — tr) have the joint asymptotic distribution N(0, AVA’), where A = 
(a;;). 
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To obtain the asymptotic distributions we have only to verify that the assump- 
tions of this statement are satisfied, and compute A, since the asymptotic 
distribution is characterized completely by AVA’. We shall denote the element in 
the k-th row and /-th column of AVA’ by o(f: , f:). We shall find it convenient 
to use the notation df = Adz; that is, the differential df is defined in terms of the 
limit matrix A. 


Let 
(4.1) A=Mu, 
(4.2) B=Ma ? 
ed) 
(4.4) E =>plim M,, , 
T-—»0 
(4.5) L = Pau ) 
(4.6) P=P,= MaMa, 
(4.7) A= In, 
(4.8) Il = Il. 


The matrix] L is the random function AM7z., + UeeMauM7z. + A of A, P is the 
random function BM; + I of B. Then 


(4.9) dL = (dA)C", 

(4.10) dP = (dB)E™. 

However 

(4.11) o(Aee , O51) = Oj, 
(4.12) o(a% , bj) = Bex, 
(4.13) o(bix , bj) = Viger , 


where a; jx1, Bijkt, Vier are the appropriate quantities kaa , respectively. From 
these we may compute oa(I;; , liz), o(li; , per), and o(p:; , Per), the elements of the 
asymptotic covariance matrix of the elements of L and P (which are asymp- 
totically normally distributed by the above). These elements can be estimated 
consistently from the sample (the proof follows from Theorem 1). 

4.2. The asymptotic distribution of 8 and 4 for constant normalization. In this 
section we shall show that 8 and 4 are asymptotically normally distributed 
(Theorem 3). In view of the above theorem on asymptotic distributions the 
intricate part of the proof is in obtaining the covariance matrix. First we shall 
demonstrate that the elements of yW are 0(1/+/T) in probability. Since Assump- 
tion I holds, it is sufficient to show that s(Pz.:M..P:) iso(1/+/T) in probability. 
This means d | Pz.M,,P:, | = 0, since each of the characteristic roots of 
P..M. P2, except the smallest approaches a non-zero limit in probability. 
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For any matrix A, A;; denotes the matrix obtained by deleting the 7-th row 
and j-th column from A, and Aix,;; is the matrix obtained by deleting the 7-th 
and k-th rows and the j-th and /-th columns. Let 


AY = (-1)*| Ass |, 

ANKE = re | Ak, it l, 
where « = 0 if (¢ — k) (j — 1) > 0, 1 otherwise when i ¥ k, j #1. A’ = 0 
ifi = k orj = l. In the rest of the paper we use the summation convention of 


tensor calculus for lower case indices; namely, that whenever a lower case letter 


appears as a superscript and a subscript in an expression, the corresponding 
terms are to be summed on that index. 
In general 


(4.14) d|A|= A‘da;;. 
We may consider P,,M «Pz: as a random function of Pz, . Then 


(4.15) d(i,j-th element of PzsMsPs:) = iex: dp} + wiendpi . 
However 


(4.16) (IzETz.) = p’e° = p's’, 
where p‘ is a factor of proportionality. Since 6II,, = 0, we haved | P.sM,, P;,| = 0. 
Then it can be shown that d(fl.Muflz, — PssM.Pz.) = 0, where fl. = 


(1 an wa?) Pa . 
BW.2(" 


Let © = I,.EM:, and F = P..MseP iz . We know that 8: = 0", where 
py = 1/p” (and the capital letter J indicates that there is not to be a sum on 
that index), and 6 = fi,,M..[i:, . Hence 


(4.17) dpi = p,dd” + 0 dp,. 
However §'3%g;; = 1; therefore 6, = (66*"y,,) +. From this it follows that 
(4.18) dps = —(8s)0"'pn d6". 
From (4.14) we see d6*” = 09*”'“"d6,; . Therefore 
(4.19) dg* = plo" — B'B'O' px ild 65. 
Let us define ¥; = 6'¢;;. Let us multiply (4.19) by 6,; and y,;. We obtain 
6,:d8° = ps0,0°'dbas 
= psdy 0" dbas — ps0”"'dbyg = —B"dbya, 
y.de* = 0. 
Let us simplify (4.20). We see that 


(4.20) 


(4.22) B°db,. = Bexie..dps, . 
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Hence 


(4 23) o(6°d6,. ? Bd 6) aa B°n'y €x1B'm y Cnje "Yami 


_ Q%& on _m_ ¢ ni 
= BB Ty Yapmi = Tiy, 


say. Let o(8*, 8’) = qi’, and let Q; = (qi). Then from (4.20) and (4.23) we obtain 


(4.24) 00.0 = Rh, 

and (4.21) is 

(4.25) ¥Q, = 0. 

It may be shown (see [1], for example) that the solution is 
(4.26) Qi = (I — BY) .x(On)*(Ridex(Oue) “I — ¥'B)e. 5 


where k(1 < k < H) is arbitrary except that 6° ~ 0, and A. denotes A with 
the k-th column deleted, etc. If the normalization is 8° = 1, k = 7 is a convenient 
choice. 

Since 4 = —£L, 


(4.27) 
Hence 
(4.28) o(8’,4") = —0(8’, BN? — o(8’, MB’, 

(4.29) o(f”, 4") = o( 8, BY)ATAF + 0(8’, BAF + 08’, U7)B'AP + o(l?, 17) 8'6’. 


We, therefore, see that we must compute o(@’, [7)8° and (I, 17)3°3’. We find, 
from (4.20), (4.21), and (4.22) that 


dy” = —dé'n? — Bid? . 























(4.30) 0, 8'0(8', GG) = —6'B’ric”’Biipk = 12>, 

say. Let (o(8’, [7)8") = Q., and let R. = (rZ,). Then, from (4.30) and (4.21) we 
obtain 

(4.31) 6Q2 = R2, 

(4.32) yQ. = 0. 

The solution is 

(4.33) Qe = (I — 6p) ..(On) (Re)e. « 

We find, readily, that 

(4.34) B'B’o(I7, 17) = B°B’c”’c™ ain = G3", 








say, where (c””) = C™*. Let Q; = (q3”). This concludes the proof of Theorem 3. 

THEOREM 3. Jf Assumptions A, B, F, H, I, K, L, and M are satsfied, /T(8 —8) 
and \/T(4 — vy) are asymptotically jointly normally distributed with means zero 
and covariance matrix 


(4.35) o(8’,8) =, 
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(4.36) o(8’, 4) — QoTeu = Qe ; 
(4.37) o(4’, 4) = TwQitie. + TQ. + Qi + Qs, 


where Q,; is given by (4.26), Qe by (4.33), and Q3 by (4.34). 
If there is a kind of asymptotic independence of ¢; and z;, then the above 


expressions may be simplified. Corollary 1 results from Theorem 3 and the 
following assumption: 


le ’ 
ASSUMPTION N. lim T 7 &(¢iz:2:) = o R, where R is defined in Assumption F. 
T—2 t=1 


Coro.uary 1. If Assumptions A, B, F, H, I, K, L, M, and N are satisfied, 
/T(8 — 8) and~V/T — vy) are asymptotically jointly normally distributed with 
means zero and covariance matrix 


(4.38) o(8’, B) = o (I — Bp) x(n) (I — WB)., 
(4.39) o(8’, 4) = —o°(I — B'p).2(Ox) (Hew + Wy )e. ; 
(4.40) go 7, 4) Pe o*[(il.. + y'v).x(Oxx) (Ten + W'y)k- + o.. 

4.3. Asymptotic distribution of the estimates of the parameters B and y with 
normalization a function of Qzz . 

If we relax Assumption L that #,, is constant, we obtain a more general 
result. Since the proof, however, is more involved, we shall not give it here; 
the reader is referred to [1]. In the derivation of the estimates 2,, was defined as 


&(8,5,). In the asymptotic theory we do not assume that this is the same for 
each t. We use the following assumption: 


= 
ASSUMPTION O. lim T a (615515 51x Zu) = Nijx exists; 
T—0 t=1 


T 
lim Z 7. &(6155:5) = i; exists; 


T—-2 T t=1 


. 1l< , cet ie 
lim T » G (612613580) = Bijnr + Gi Gyr exists. 
T—0 t= 


Let 6;jx. be the quantities n;;,; corresponding to the wu’s, €:j:, the quantities 
corresponding to the c’s. Define 


(4.41) x‘ = yet at 


dw.’ 
(4.42) tay = Beryx esi, 
(4.43) a = (I — BV)«On) (rae. , 
(4.44) qs = x? x Dijet, 
(4.45) Ge = x'B"8ijmic™. 
With the aid of the matrices Q, , Q2, and Q;, the vectors q and ge. , and the 
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scalar g; , we may express the asymptotic covariance matrix of the estimates. 
We obtain 

THroreM 4. Jf Assumptions A, B, F, H, I, K, M, and O are satisfied, and 
@,, is a function of Q22,+~/T(8 — 8) and ~/T(4 — vy) are asymptotically jointly 
normally distributed with means zero and covariance matrix 

Qi + 948 + B’qs + 58'B, 
—Q, Tew + qs = 8’ qalleu + qsB'y = Q> = B'ds ’ 
= Tie. QiTew or Thou qev oa y'qalleu + qsv'Y 
+ i1...Q» + Qeiley — v'% - ay + Q; , 
where Q: , Q2 , Qs, 91, 95, and gs are given by (4.26), (4.33), (4.34), (4.48), (4.44), 
and (4.45) respectively. 

Corotitary 2. If Assumptions A, B, D, F, H and K are satisfied, and 
@., = 22, VT(B — 8) andVWT4 — vy) are asymptotically jointly normally 
distributed with means zero and covariance matrix 
(4.49) 0B’, 8) = (I — B'Y).n(Oux) “(UT — WB). + 46°, 

(4.50) o(8’, 4) —(I iil Bp) (Orn) (Ten + VY )k- + 38'Y, 
(4.51) a(7’,47) = (Tes + yb) .1(Oxz) (ew + Wye. + CC + 377. 


5. Asymptotic distribution of the likelihood ratio criterion and the small 
sample criterion for testing a certain hypothesis. The likelihood ratio criterion 
for testing the hypothesis that the number of coordinates of z,; with zero co- 
efficients in the selected structural equation is as great as it is assumed to be is 
(1 + py) *” (2, Theorem 2], where v is the smallest root of 


(5.1) | PollaPs. =— vW zz | = 0. 
Then 


BP os Mae Ps mA ma 
Tv = z gos ge B — (W TBP =) ee 3 (WV TBP 2s)’. 


From Theorem 5 of [4] it follows that the asymptotic distribution of Tv is the 
same as that of the quadratic form x 2 x’, where x has the limiting distribution 


of ~/ TBP. , use being made of plim 8W,28’ = o°. We have 


T—20 
(5.3) dx’ = B’dp} + dB’x} . 
Let T = (I — B’p) (On) UW — WB)... Then 

(5.4) dp? = —v" 8 aeemndp? ‘ 
Substituting in (5.3), we obtain 


(5.5) dx = B'dp} — v"B’Temndpi mr . 
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Then 
(5.6) o(a’, 2’) = o (c” — apni") = ot" 
say, and (¢”) = =. 

Let F be chosen so EK = FF’ and F’=F = W is diagonal. Since HZEEZE = EZE, 
the diagonal elements of VY are 1 and 0. The number of elements that are 1 is 


the rank of EHZE, namely, D — H + 1, where D is the number of coordinates 
of v; (the number of coordinates whose coefficients in the selected equation are 


assumed to be zero). Let z = lor. Then the asymptotic distribution of Tv 
oC 


is the distribution of zz’ where z is normally distributed with mean zero and 
covariance matrix WV. It is the x’-distribution with D — H + 1 degrees of freedom. 
We observe that T log (1 + v) and TD) are asymptotically equal to T'v, where 
is the criterion based on small sample theory [2, Theorem 4]. Finally, we note 
that v is independent of the normalization of 8. 

THEOREM 5. If Assumptions A, B, F, H, 1, K, M, and N are satisfied, —2 times 
the logarithm of the likelihood ratio criterion, —T/2 log (1 + v), the asymptotically 
equivalent Tv and TD times the small sample criterion, X, for testing the hypothesis 
that the number of coordinates with zero coefficients is D are asymptotically distributed 
as x’ with D — H + 1 degrees of freedom. 

This theorem indicates how conservative the small sample test is asymp- 
totically, for that test asymptotically is equivalent to using Tv as having an 
asymptotic x’-distribution with D degrees of freedom. 

6. Asymptotic behavior of confidence regions based on small sample theory. 
In [2] we deduced confidence regions for 8 and for 8 and y when Assumption E 
holds. If the normalization of 8 is 


(6.1) 6®,,8’ = 1, 


where ®,, is a given matrix, then a confidence region (a) for 8 of confidence « 
consists of all 8* satisfying (6.1) and 

6*M..M.. M.28* D 

ee. « ” 
(6.2) 3*W..8* =T_K Fp,r x(e), 
where [’p,7-x(e) is chosen so the probability of (6.2) for 6* = 8 is e and K is 
the number of coordinates of z,; and D is the number of coordinates of v,. A 
region (b) for 8 and y simultaneously consists of 8* and y* satisfying (6.1) and 


B*MaMyMuz8™ + B*May® + 7*MucB™ + y*Muuy® +8*M is Moe Mi 28*" 
B* Wee B*’ 
(6.3) , 
K 
Sp K Pat. 
We shall now show that even if Assumption E does not hold the regions have 
asymptotically confidence coefficients « and they are consistent under general 
conditions. 
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Let c= BM.Mv. + vy, € = BM,,M;. . We observe from Section 4 that if 
Assumptions A, B, F, H, K, L, M and N are satisfied, the vectors+~/Tc and 
4/Te have asymptotic independent distributions N(0, oC) and N(O, o’ E™’), 
respectively. Then TcM,,.c’/o’ and TeM,,e’/o’ will have asymptotic independent 
x’-distributions with F(= K — D) and D degrees of freedom, respectively. 
Also 8W;28’ approaches o’ stochastically. By Theorems 5 and 6 of [4], the left- 
hand sides of (6.2) and (6.3) have asymptotic F-distributions with D and T -- K 
degrees of freedom and K and T' — K degrees of freedom, respectively. 

We shall prove that (a) is consistent for 8; the proof is similar for (b) as a 
region for Bandy. If we replace 8 by b in the definition of e,eM,..e’ =bMz,M.. M.2b’. 
For b ¥ 8 we must show that the probability that b will fall in the confidence 
region for 8 approaches zero. The above form approaches bIl,,EM;,b’ in proba- 
bility. If b + 8 and satisfies (6.1) then bII,, ¥ 0 and eM,,e’ has a non-zero limit 
in probability since £ is positive definite. Thus b is not in the limiting confidence 
region. 

THEOREM 6. If Assumptions A, B, F, H, I, K, M, and N are satisfied, the 
confidence regions of Theorem 3 of [2] (including (a) and (b) above) are consistent, 
and the regions (a) and (b) have asymptotically the confidence levels e. 
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SOME NONPARAMETRIC TESTS OF WHETHER THE LARGEST 
OBSERVATIONS OF A SET ARE TOO LARGE 
OR TOO SMALL 


By Joun E. WAtsH 


The Rand Corporation 


1. Summary. Let us consider a large number n of observations which are statis- 
tically independent and drawn from continuous symmetrical populations. This 
paper presents some nonparametric tests of whether the r largest observations 
of the set are too large to be consistent with the hypothesis that these populations 
have a common median value. Tests of whether the r largest observations are 
too small to be consistent with this hypothesis are also considered. Here r is a 
given integer which is independent of n. 

Subject to some weak restrictions, it is shown that the significance level of a 
test of the type presented tends to a value a as n increases. For no admissible 
value of n, however, does the significance level of this test exceed 2a. If whether 
the largest observations are too large is considered, tests with values of a suitable 
for significance levels can be obtained for r > 4. Values of a suitable for sig- 
nificance levels can be obtained for any value of r if whether the largest observa- 
tions are too small is investigated (n large). 

Properties of the power functions of these tests are considered for the special 
case in which the r largest observations are from populations with common 
median 6, the remaining observations are from populations with common 
median ¢, and each population has the property that the distribution of the 
quantity 

(sample value) — (population median) 


is independent of the value of the population median. For tests of 6 > ¢, the 
power function tends to zero as 6 — ¢ — — © and to unity as 6 — ¢ >. For 
tests of ¢ > 6, the power function tends to unity as @ — @¢ — — © and to zero 
asé@—gd— ~, 

Analogous tests of whether the smallest observations of a set are too small or 
too large can be obtained from the tests of the largest observations by symmetry 
considerations. 

If there is strong reason to believe that the set of observations is a random 
sample from a continuous population, the tests presented in this paper can be 
used to decide whether the population is symmetrical. Tests of this nature are 


sensitive to symmetry in the tails of the population but not to symmetry in the 
central part. 


2. Introduction and statement of tests. The tests derived in this paper are 
applicable to situations of the following two types: 


(a). It is known that the observations are independent and from continuous 
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symmetrical populations (i.e., each population has a continuous cdf F(x) 
such that F(x — ¢) = 1 — F@ — 2), where ¢ is the population median). 
It is desired to test whether the largest few observations are too large 
(or too small) to be consistent with the assumption that the populations 
have a common median value (if the 50% point of a continuous sym- 
metrical population is not unique, the median of this population is de- 
jined to be the midpoint of the interval of 50% points). 

(b). It is known that the observations are independent and from continuous 
populations with a common median value (e.g., the observations may 
be a sample from a continuous population). It is desired to test whether 
these populations are symmetrical (with emphasis on the tails of the 
population). 

With respect to (a), perhaps the most common practical application is that 
where the observations are assumed to be a sample from a continuous sym- 
metrical population of some special type (e.g., normal) but the values of the 
largest few observations make this assumption questionable. The nonparametric 
tests presented for (a) are easily applied and a significant result for a non- 
parametric test automatically implies that the observations are not a sample 
from the specified type of population. Furthermore, if a parametric test of this 
situation (i.e., a test based on the assumption of a sample from this special type 
of population) is significant, the nonparametric tests are useful in determining 
whether it is possible that the observations might be a sample from a continuous 
symmetrical population of some other type. 

With respect to (b), perhaps the most common application is that where the 
set of observations can be considered to be a sample from a continuous population 
and it is desired to test whether this population is symmetrical in the tails. 

Now let us consider the forms of the tests. Let x(1), --- , x(n) represent the 
values of the n observations arranged in increasing order of magnitude. Then 
s(n +1-—r), e(n + 2 — r), ---, x(n) are the r largest observations of the 
set. For situations of type (a), the tests of whether the r largest observations 
are too large are of the form 

Test 1. Accept that the r largest observations are too large to be consistent with 
the hypothesis that the populations have a common median if 


min [x(n +1—%) +2G);1< ks < r] > 22(W,), 
where the 2’s, j’s and n are integers such that 
1 =f, a. wm Su XK Dost s ji <Wex<n+1-r, 
ais defined by 
a = Pr{min [x(n + 1 — 7%) + x(jx)] > 26|¢ = common median}, 
and W, = W,(n) is the smallest integer satisfying the relation 


(1) Pr{x(Wa) < @|¢ = common median] < a. 
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In testing the hypothesis of Test 1, the principle followed is to choose 
a(n + 1 — r) and some subset of x(n + 2 — r), --- , x(n) for use in the test. 
The integer s represents the total number of order statistics selected from 
a(n+1-—r),--:, x(n). 

The value of a = a(t, +++, %s3j1, °** » je) is independent of nm and is given 
by equation (4) in Section 3. Table 1 contains some values of the 7’s, j’s and s 
which yield values of a suitable for significance levels. For Test 1, values of a 
suitable for significance levels can be obtained for r > 4. 



























































TABLE 1 
Some values of a fors < 5 

a $ | i | io | is | is is a S| k | js | is 
0625 1 4 | i| | | 
.0312 1 | 5 1 | | 
.0156 | l | 6 1 
.0078 1 | 7 1 
.0039 1 | 8 | 1 | 
0352 1 | 7 | 2 | | 
0195 1 | 8 | 2 | | 
.0107 1 | 9 | 2 

serine fasesennsanl entailed cesiashdtadeignii tidal ilieigaie iii 
.0469 2 14 | 5 1 | 2 
0234 2 5 | 6 1 | 2 
.0117 2 | 6 | 7 } 1 | 2 
0059 | 2 | 718 1 | 2 

esnsresiseritniisstnetsee iene pment en ii tae ine ine see apse nies eaiiemeititanataes t taemensniiemasiats aapaiciniistan 
0391 3 | 4|5 | 6 | 1| 2/3 
.0195 | 3 5 | 6 | 7 | 1|/ 2/3] | 
.0098 | 3 6| 7] 8 | 1) 2/3) | 
0459 4 |}4|s5|6|7 1}2]3|4| 
0229 4 15 |6|71|8 1}/2/3 1/4] 
0115 4 6|7/8s {9 1|/2|3 | 4 

ae | cece aang sdemipeinnspinenensad enemies poeeneatens saahneemprimenn fkseumsennaennese emanates paseseensete 

0308 5 '415/ie6tl/7]8slil1}/2/3i4is5 
.0154 5 }5/6)/7;/8] 9] 1}/2)3/ 4) 5 
.0077 5 6 | 7 8,9] 0)1)2)3 > 4) 5 








If the n independent observations satisfy the additional conditions 
(i). Asymptotically (n—), 2(W.) is statistically independent of min 
[c(n +1 —%) + 2QGr);1< k < 8). 
(ii). The standard deviations of x(W.) and min [a(n + 1 — %&) + 2x(jx); 
(A) 1<k < s| exist for alln > 2, + 7, — 1 and the limiting ratio (n — «) 
of these standard deviations is either zero or infinite. 
(iii). Let the notation o(z) denote the standard deviation of z. Then, if the 
populations have a common median ¢, asymptotically the cdf’s of 
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[e(W.) — d]/ole(W.)] and {min [x(n + 1 — a) + 2(je)] — 26}/ 
o{min [x(n + 1 — %) + x(jx)]} are continuous at the point zero. 
then the significance level of Test 1 approaches the value a as n tends to infinity. 

Although conditions (A) may appear to be complicated, they are not very 
restrictive. These conditions are satisfied if the n observations are a sample 
from a continuous population of the type usually encountered in practical 
situations (i.e., approximated in practical situations). Perhaps the most well 
known type of continuous symmetrical population for which a sample does not 
satisfy conditions (A) is that with a triangular probability density function. 
Part (ii) of conditions (A) is not satisfied for a sample from a population of 
this type. 

For large n, relation (1) with the equality sign is approximately satisfied if 
W. = in + 1K.vVn, (ie., the largest integer contained in }n + 3KavV/n). 
Here K, is the standardized normal deviate exceeded with probability a. This 
value for W, was obtained from the normal approximation to the binomial 
theorem and furnishes a reasonably accurate solution of (1) with the equality 
sign for n > 10, (see [1]). 

As an example of a test of type 1, let r = 5,s = 2,7: = 1, jo = 2,% = 4, 
2, = 5. Then a = .0547 and the test is (approximately) 

Test 2. Accept the specified aliernative of Test 1 if 


min [x(n — 3) + 2x(1), x(n — 4) + x(2)] > 2a(3n + 3K cern). 


That this is a test of whether the 5 largest observations are too large is intuitively 
evident from the fact that a significant result will be obtained only if both 


a(n — 3) > 2a(4n + 4K osar/n) — x(1), 
a(n — 4) > 2x(4n + 3K osirr/n) — x(2). 


If the smallest two of the five largest observations are too large, it seems reason- 
able to suppose that all of the five are too large. A similar interpretation exists 
for all tests of the type of Test 1. 

The type (a) tests of whether the largest observations are too small are of 
the form 

Test 3. Accept that the r largest observations are too small to be consistent with 
the hypothesis that the populations have a common median value if 


max [r(n+1—-j7.) + 2(%);1< ks <r] < 2e(n+ 1 - W.,), 


where je = 1, Jv < joi, tu < hus 6 <n+t1l—We<n+1 —r7, and botha 
and W, are defined in Test 1. 

From the results for Test 1 and symmetry considerations, the significance 
level of test 3 tends to a as n — © if conditions (A) are satisfied; it does not 
exceed 2a for any admissible value of n. For Test 3, values of a@ suitable for 
significance levels can be obtained for all values of r (n sufficiently large). 

As indicated by (2), the tests of whether the largest observations are too large 
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can also be interpreted as tests of whether the smallest observations are too 
large. Similarly the tests of whether the largest observations are too small can 
also be interpreted as tests of whether the smallest observations are too small. 

The above discussion presents intuitive reasons for believing that Tests 1 and 3 
are suitable for the situations to which they are applied. To obtain a semi- 
quantitative measure of the suitability of these tests, this paper investigates 
the special case in which the r largest observations are from continuous sym- 
metrical populations with common median 6, the remaining observations are 
from continuous symmetrical populations with common median ¢, and each 
population has the property that the distribution of x — y is independent of y, 
where « is an observation from the population and y is the median of the popula- 
tion. The power function of a test of type 1 or 3 is defined to be the probability 
that the test is significant given the value of @ — ¢. It is found that the power 
functions of these tests have several desirable properties: For Test 1, the power 
function tends to zero as @ — ¢ — — ©, is a monotonically increasing function 
of 6 — ¢ for 6 — ¢ < 0, and tends to unity as @ — ¢ — o. For Test 3, the 
power function tends to zero as @ — ¢ — ©, is monotonically decreasing for 
6 — @ < 0, and tends to unity as 6 —-@—- —o~. 

For testing whether the populations are symmetrical in the tails given that 
they are continuous and have a common median, i.e., situation (b), a combination 
of 1 and 3 is used. The resulting test is 

Test 4. Accept that the populations are not symmetrical in the tails if either 


min [x(n + 1 — %&) + 2(j%x)3;1 Sk < 8] > 22(Wa) 


max [x(n + 1 — jx) + t(tx)3;1 Sk < 8] < 2e(n+ 1 —- W,), 


where a < 4, tu < tusi, Jo < Jott, Jw < tw, Js << Wa <n+1 —i,, andbotha 
and W, are defined in Test 1. 

Since both inequalities in Test 4 can not be satisfied simultaneously, the 
significance level of Test 4 tends to 2a as n — © if conditions (A) are satisfied; 
it never exceeds 4a for any admissible value of n. 

The asymptotic distribution (n — ©) of x(W.) is usually not very sensitive 
to symmetry of the populations. For example, if the n observations are a sample 
from a population with a probability density function f(x) such that (f() ¥ 0, 
(@ = population 50% point), and f’(x) exists and is continuous in a neighborhood 
of x = ¢, it can be shown that the only property of f(x) which influences the 
asymptotic distribution of 7(W.) is the value of f(¢). Thus, since a type 1 test 
investigates both whether the largest observations are too large and whether 
the smallest observations are too large (to be consistent with the assumption of 
symmetry), while a type 3 test investigates both whether the largest observations 
are too small and whether the smallest observations are too small, Test 4 should 
be suitable for testing whether a population has symmetrical tails. 
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3. Theorems and derivations. The fundamental fact used in this paper is 
that, if the observations are from continuous symmetrical populations with 
common median ¢, the value of 


Pr{min [x(n + 1 — &) + x(x); 1 < k < 5] > 24} 
Pr{max [x(n + 1 — jx) + e(%e)3; 1 < k < 8] < 2} 


is independent of n for the values of n permitted in the tests. This result is a 
special case of the following theorem 

THEOREM 1. Consider a set of n independent observations from continuous 
symmetrical populations with common median ¢. Lett, < -++ <tsandjy << +++ <Js 
be fixed sets of integers whose values are independent of n. Then the value of 


Pr{@th largest of [a(n + 1 — jx) + x(k); 1 << k < 8] < 2p} 


is the same for all values of n which are >i, + js — 1. In particular 


Qa 


m(2) m(3) m(2)—he 
a= ft + m/(1) +h [m(1) — Ay] + py a [m(1) — hi — he] + «= 


(3) m(u) m(u—1)—hy} m(2)—ho—+**—hy_} 
4+ 2 7 wee pm [m(1) —hy — -e* — hua, 
hy_1=1 hy—2=1 hy=1 


where 








w=i1.+ 7. —1, u=j,—1, mMijetve— 1) = t+ je — te — Je — 1 + I, 
~=0,1,---,s-—1, lsvuS jeri — Je; yo =j—-1=0. 
ProoF. It is sufficient to prove the theorem for the expression 


Pr{max [x(n + 1 — je) + e(%);1 Sk < 8] < 29}, 


since any probability expression of the form Pr{@th largest of [] < 29} 

can be expressed as a specified constant plus a sum of probabilities of the form 

Pr{max [ ] < 2} multiplied by specified constants, where in each case the 

terms in the [ ] area subset of thes terms: x(n + 1 — jx) + r(%), 1S k <8). 
Let the integer n have the value m . Then it can be verified that 


Pr{max [x(mo + 1 — je) + e(%e)3; 1 Sk < 8] < 26} 
(4) = Pr{max {22(m — js), te[mo +1 — W)+2[m+1-—- W-— m(W)); 


1<W < je} < 29], 
where 








Mj: tve—-—1) =m+2—-—%:—fe—%, m(js) = % — ts — Je =| 1, 
t=0,1,---,s-1l, lsuSjeir— Je; ob =j — 1 =0, 


by the use of Theorem 4 of [2]. By the proof of Theorem 5 of [2], the value 
of the second term in (4) equals 
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Pr{max {2x(m — js), 2[mo + 2 — W] + alm + 1 — W — m(W)); 
1<wW<jt+1} < 2) 


if m(js + 1) = 1 and the expression is based on m + 1 rather than 7 observations 
(the values of the m’s are the same as in (4)). The value of this expression, 
however, can be shown to equal the value of 


Pr{max {2x(m + 1 — js), a[no + 2 — W) + alm + 2 — W — m(W)); 
1<W <j} < 24], 
which by (4) equals the value of 
Pr{max [x(mo + 2 — jx) + t(%e);1 Sk < 8] < 29} 
ifn = m + 1 for this expression. Thus, by induction, the value of 
Pr{max [a(n + 1 — jx) + r(x); 1 Sk < 8] < 2} 


is the same for all sample sizes n > 2, + j,. An analysis similar to that used 
in the proof of Theorem 5 of [2] shows that this also holds for n = 7, + j, — 1. 
Equation (3) was obtained by taking n = w = i, + j, — 1, the m’s as given by 
(4) with this value of n, and substituting into Theorem 4 of [2]. 

Another basic result is that, if the observations are from continuous symmetri- 
cal populations with common median ¢, the value of , 


Pr{min [x(n + 1 — %) + r(x); 1 Sk < 8] > 22(W2)} 
= Pr{max [x(n + 1 — je) + ee); 1 < k < 8] < 2e(n + 1 — W,)} 


is always less than or equal to 2a. This is a particular application of the theorem 
THEOREM 2. Consider n independent observations from continuous symmetrical 
populations with common median >. Then, for any integer W, 


Pr{max [x(n + 1 — jx) + c(t); 1 Sk < 8] < 22(W)} 
< Pr{max [x(n + 1 s jx) + x(te)] < 26} + Pr{x(W) > 9} 
— Pr{max [x(n + 1 — jx) + 2(t,)] < 26, c(W) > 9}. 
PROOF. 
Pr{max [ ] < 22(W)} = Pr{max [ ] < 296, x(W) > ¢} 
+ Pr{max [ ] < 26, x(W) < ¢, max [ ] < 22(W)} 
+ Pr{max [ ] > 26, 7(W) > ¢, max [ ] < 22(W)} 
< Pr{max [ ] < 2¢,2(W) > o} + Pr{max [ ] < 29, c(W) < 4} 
+ Pr{max [ ] > 26, 2(W) > 9} 


= Pr{max [ ] < 26} + Pr{x(W) > ¢} — Pr{max[ ] < 26, z(W) > 9}. 
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If the n independent observations satisfy conditions (A) in addition to being 
from continuous symmetrical populations with a common median value, the 
significance level of Tests 1 and 3 tends toa as n — ~. This follows from sym- 
metry considerations and 

THEOREM 3. Consider n independent observations which satisfy conditions (A) 


and are from continuous symmetrical populations with a common median value. 
Then 


lim Pr{min [x(n + 1 — %&) + a(j);1 < k < s] > 22(W.)} = a. 
Proor. Let 
Y=min(an+1—-%“%)+27y%);1<S k<s!] 
and consider the case where 


lim o[x(W.)|/o(Y) = 0. 





Since the populations are continuous, o(Y) > 0 and 
Pr[Y > 2x(W.)] = PrlY — 26 > 22(W.) — 29] 
Pr{[Y.— 29]/o(Y) > 2[z(Wa) — $]/o(Y)}. 


Let 
Z = 2[x(W.) — $)/o(Y). 


Then, from (i) of conditions (A), 
Prl¥ > 2x(We)] = [ Pril¥ — 261/o(¥) > a} dlr.(a) + a(n), 


where F, is the cdf of Z and lim B(n) = 0. 


Let b be any positive number. From lim o(Z) = 0, (ii) of conditions (A), and 


no 


the definition of z(W.), the mean of Z exists for all values of n and tends to 
zero as n — ©. Then, by Tchebycheff’s Inequality, it can be shown that 


[ dF,(a) = 1 — y(n), 


where lim y(n) = 0. 


From (iii) of conditions (A) 
lim Pr{[Y — 2¢]/o(Y) > —b} = lim Pr{[¥ — 26]/o(Y) > b} + 4(0), 
where lim 6(b) = 0. 
b—0 


Using the above relations, letting n — © first and then b — 0, it follows from 
Theorem 1 that 


lim Pr[Y > 22(W.)] = Pr{[Y — 26]/o(Y) > 0} = a. 


no 
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A similar type proof shows that this limiting relation also holds when 
lim o[x(W.)|/o(Y) = ~. 
Tinally consider properties of the power functions of Tests 1 and 3 for the 


special situation outlined in sections 1 and 2. The properties stated in the pre- 
ceding two sections follow from 


THeoreM 4. Letx(n + 1 — r), --- , x(n) be from continuous symmetrical popula- 
tions with common median 6, the remaining order statistics from continuous symmet- 
rical populations with common median ¢, and each population have the property 
that the distribution of x — wy is independent of y, where x is an observation from 
the population and y is the median of the population. Also let 


P\(@) = Pr{min [ein +1—-%) +2); 1< kos <r 
> 22(Wa)|9—% = 9}, 
where the conditions for Test 1 are satisfied, and 
P,(®) = Pr{max [x(n + 1 — fx) + tH); 1 Sk Ss <7] 
< 2a(n + 1 — W.)| 0 — ¢ = 9}, 
where the conditions for Test 3 are satisfied. Then 


lim P,(#) = 0, lim P\(@) 1, 
b+ 


d——co 


}—+—co 


b0 


P,\(@) is a monotonically increasing function of ® for & < 0, and P;(#) is a mono- 
tonically decreasing function of ® for ® < 0. 

Proor. It is sufficient to prove this theorem for the power function of Test 3. 

The results for P,(@) can be obtained from symmetry considerations and obvious 
modifications of the proof for P3(). 
' First consider P;(@) for the case where ® < 0. Let a new set of observations 
be formed from the given set by subtracting the median value of the corre- 
sponding population from each observation. Let y(1), --- , y(n) be the values 
of the set of modified observations arranged in increasing order of magnitude. 
Since  < 0, 6 < ¢ and 


lstsn-r, 
n-r+1lstsn. 
Thus 
P;@) = Pr{max lyn +1— jp) ty@);loksscr) 


— 2y¥(n+1-—-— W,) < —9#}, 
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whence it follows that P3(@) is a monotonically decreasing function of ® for 
® < 0 and that lim P;(@) = 1. 


d—+—00 
Now consider the case where ® > 0. Again form the set of modified observa- 
tions and let y(1), --- , y(n) be the values of these observations arranged in 
increasing order of magnitude. Then it is easily seen that 
P;(®) < Prly(1) — y(n) < — 39] 


so that lim P;(@) = 0. 


bo 
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ON A MEASURE OF DEPENDENCE BETWEEN 
TWO RANDOM VARIABLES 


By Nits Biomavist 
University of Stockholm and Boston University 


1. Summary. The properties of a measure of dependence q’ between two 
random variables are studied. It is shown (Sections 3-5) that g’ under fairly 
general conditions has an asymptotically normal distribution and provides 
approximate confidence limits for the population analogue of q’. A test of inde- 
pendence based on q’ is non-parametric (Section 6), and its asymptotic efficiency 
in the normal case is about 41% (Section 7). The q’-distribution in the case of 
independence is tabulated for sample sizes up to 50. 


2. Introduction and definitions. In drawing conclusions from statistical data 
it frequently happens that it is unnecessary to utilize all the information given 
by the data. In such cases it seems desirable to use methods which are 

1) valid under rather weak assumptions regarding the distribution of the 
population and 

2) easy to deal with in practice. 

Naturally such methods should always be used, but their applicability is, in 
most cases, limited by their small efficiency. 

Concerning methods of measuring correlation and testing independence some 
so-called rank correlation coefficients have been defined [2, 3, 4, 6] which have 
the first property. In large samples these are, however, rather tiresome to calcu- 
late, and a simpler method might then be preferable. The coefficient studied 
here has in most cases both properties mentioned above and can be used when- 
ever its efficiency is not too small. 

Let (a1, y:) --+ (%n, yn) be a sample from a two-dimensional population with 
cdf F(x, y), and consider the two sample medians Z and 7. The cdf F(z, y) is 
assumed to have continuous marginal cdf’s F(x) and F2(y) in order that the 
probability of obtaining two equal x-values or two equal y-values in the sample 
will be zero. Let the x, y-plane be divided into four regions by the lines x = % 
and y = 4. It is then clear that some information about the correlation between 
x and y can be obtained from the number of sample points, say n; , belonging 
to the first or third quadrants compared with the number, say nz, belonging 
to the second or fourth quadrants. 

Before going further we shall explain what is meant here by ‘belong to’. If 
the sample size n is an even number the calculation of ; and nz is evident. If, 
however, ” is an odd number one or two sample points must fall on the lines 
x = Zand y = J#. In the first case this sample point shall not be counted. In 
the other case one point falls on each of the lines. Then one of the points shall 
be said to belong to the quadrant touched by both points, while the other shall 
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not be counted. It is easy to verify that both m; and m, by this method will be 
even numbers. 

As a measure of correlation we define 

ni — Ne 2n1 

1 © a at ae | —l1<q7 <1). 
a) m+n M+ Nm ( s¢sv 

The definition of g’ is not new [5] but as far as is known, its statistical proper- 
ties have never been studied completely. 


3. The asymptotic distribution. It is known [1] that the median in a sample 
from a one-dimensional distribution under certain conditions is a consistent 
estimate of the population median and asymptotically normally distributed. 
Although it seems possible to weaken the requirements in our case, we shall not 
do so. We require that 

a) the population medians are uniquely defined (and assumed to equal zero), 

b) the marginal distributions of F(x, y) admit density functions f,(z) and 
foly). 

c) fi(x), fo(y) and their first derivatives are continuous in some neighbourhood 
of the origin and 

d) f:(0) and f2(0) are ~0. 

In order to avoid trivial complications we shall assume here that the sample 
sizen = 2k + 1. 
Now define for every arbitrarily chosen point (2, y) 


a(z,y) = P{fgé>2,9> y}, 
b(z,y) = Pié< 2,0 > y}, 
e(z,y) = P{—é< 2,0< y}, 
d(a,y) = Plfé>2,n< y}, 
where the measure P refers to the cdf F(x, y) and evidently 
a+b+ct+d=1. 


As the number of sample points belonging to the first and third quadrants 
around (%, 7) must be equal, the probability of the combined event 


{m1 = 2r; Te(z, z + dx), Gey, Y + dy)} 





is ; 
. (2k + 1)! r k—r 
(3) px(2r; x, y) = rP-(k — nr) -(ac)"- (bd) -8, 
where 
i tw dende~ 2 este 
a b 


k-r 


+o + dee» dye — - 7 -d,d-d,d + dF. 
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Each of the first four terms of the expression (4) refers to a case in which two 
sample points determine (%, 7), and the last term refers to a case in which (2, 7) 
is determined by only one point. From (8) it follows that the probability of 
obtaining 7; at most equal to 2R is 


oo co R 
(5) Pim <2R} = [ [ D pl2r;2,y). 
If we introduce the joint cdf (x, y) of and %, (5) can be written 


R 
~ D pe(2r; x, y) 
(6) Pims2R}= | [ duly = 








k ’ 


x px(2r; x, y) 


as 


k 
d¥,.(x, y) = x p(2r; x, y). 


Clearly the integrand in (6) is <1 everywhere it exists. In the points (2, y) 
where the denominator is equal to zero the integrand is undefined, but as the 


measure (YW) of the set of such points is zero, we need not have any trouble 
with them. 


Under the conditions a)-d) % and § converge in probability to zero; that is 
1 for {x > 0,y > 0}, 
lim ¥;,(2, y) = 
ke 0 otherwise. 


Thus, when k and RF tend to infinity such that : — const, (6) becomes 


R 

Dd px(2r; 0, 0) 
(7) lim P{n, < 2R} = lim *———__ 
Xu px(2r; 0, 0) 


According to (3) 


(8) pu(2r5 0,0) = SE (cue) (bua) “Ss, 


where the subscripts indicate the value at the point (0, 0). Because of (2), 
Co = &, do = bo and ay + bo = 3, 
and the two parts of (8) are for large k 


(2k + 1)! 


1 
27rdo boV/ 2xk - 


or 12(k—r) —((r—2kag) 2/4kagho) 
<a ~ . -_ 
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and 


~9(2,0,-OO.+OG).- ORs 


The first of these expressions follows from the usual application of Stirling’s 
approximation formula and we omit all details here. 
Hence, after the introduction of 


r = 2kay + tr/2kavbe ; 
R 2kao + Tr/ 2kaobs ’ 


the expression (7) is transformed to 


: m, — 4kao 1 [ 412 
cm of = — 
(9) lim Pt ian < r} Je 1. € dt. 


From (9) it follows that m is asymptotically normally distributed with mean 
4kay and standard deviation ~/ 8kaob) . Thus 


fae ~ta Dw 


2k k 


is asymptotically normally distributed with mean 4a — 1 and standard deviation 
20/a0(1 — 2ao)/k. 


4. Properties as an estimator. Suppose we measure the correlation between 
x and y by 


(10) a= [fart [fae] —1 = 40-1, 


where, as before, (0, 0) are the coordinates of the population medians. Then g 
has the desired property of being equal to zero in the case of independence and 
equal to +1 in the case of linear relationship between x and y. 

According to (9) q’ is a consistent estimate of g when the conditions a)—d) are 
fulfilled. Furthermore, as the standard deviation of q’ is, to a first approximation, ' 
independent of quantities other than q, it is possible to construct approximate’ 
confidence limits for g for large sample sizes. This is done in the following way. 
In terms of n and gq we have, according to the last paragraph of section 3 and 
(10), 


Eq’ ~4, 


(q) ~ 4/—. 


Let ®(z) be a standardized normal cdf and ); and , two numbers such that 
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@(\2) — O(A1) = 1 — a. According to (9) we then have 
. =. — 
P<> —_———— . r ~ 1] — 
(11) { 1 i /1 st @ /n < 7 Qa, 


which gives the desired result. 
If we let Xx» = —A, = A and solve the inequality in (11) for q, the following 
symmetrical confidence interval is obtained 


i. seaceamiimmenenens bh. ceeds 
g- VN +n -— 4") <a<d t+ 7 Ve + all — 9), 
where we have used that \” <n. 


5. The normal case. If x and y are normally distributed with correlation 
coefficient p, we have 


(12) ; q = : arcsin p. 


This expression is the same as the mean of Esscher-Kendall’s rank correlation 
coefficient 7 [2, 4]. Hence, in the normal case q’ and 7 estimate the same quantity. 
The coefficient g’ has, however, a much smaller efficiency. The asymptotic 
efficiency of gq’ relative to the afore mentioned coefficient is 


4 [1 So 
o°(r) - —- 5 = 2 arcsin £) | _ 
ai . E _ € arcsin r) | 


. 
9 


for p = 0. 

6. Tests of independence based on q’. In testing independence between zx 
and y it is in practice more convenient to use critical regions based on 7, instead 
of q’. Since, under the null hypothesis, the measure of a critical region is inde- 
pendent of F(z, y) (Fi(x) and F.(y) are assumed to be continuous), any test 
based on 7 is non-parametric. We have made exact calculations of the q’-distribu- 
tion for sample sizes n up to 50. For larger sample sizes the normal approximation 
for nm, does not seem to entail errors of practical importance. 

To derive the exact distribution of m under the null hypothesis we suppose 
that n equals 2k. The probability that any k sample points shall have smaller 
z-values than the other k points is 


cy" 


Hence, since any arrangement of the sample points according to their z-values 
does not affect the distribution of the y-values, 


sl 
(:) 


(13) P{m = 2r} = 
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If n = 2k + 1 it is easily verified that the probability (13) remains unchanged, 
if we use the procedure in calculating n; and nz proposed in Section 2. This is, 
in fact, the main reason for the proposal. 


Table of P}| n- k | = v} 


8 12 16 20 24 28 32 3 40 44 48 





1.000 1.000 1.000 1.000 ‘ .000 .000 .000 -000 1.000 1.000 
.333 -486 .567 .619 ‘ .684 -706 724 .740 452 .764 
.029 .O80 132 ; .220 .257 .289 .318 -343 .366 

.0022  .010 é .039 .057 .076 .094 113 -131 

0002... -0033 .0070 .012 O18 .026 .034 

-0001 .0004 .0011 .0022 .0038 .0060 

.0002 .0004 .0007 

-0001 








18 22 26 30 34 38 42 46 50 


1.000 1.000 1.000 1.000 1.000 .000 .000 -000 1.000 1.000 -000 1.000 
- 100 -206 -286 .347 .395 .434 .466 -494 517 .538 -556 .572 
.0079 .029 .057 .086 115 . 143 . 169 .194 217 .238 -258 

-0006 .0034 . 0089 O17 .027 .038 .050 .063 -076 -089 

. 0003 .0012 .0028 .0053 . 0086 -013 017 .023 

0001 .0004 .0009 .0017 .0028 .0042 

.0001 -0001 .0003 .0005 


2k is the largest even number contained in the sample size. 
The distribution of mn; is symmetric about n, = k with the variance 


2 
k 
2k — 1° 
Thus, in testing independence we can for large sample sizes use 
ny — k 


“ae -VS2k — 1 


as an approximately normally distributed random variable with mean 
and unit s.d. 


7. The asymptotic efficiency of the q’-test. In the case that x and y are nor- 
mally distributed with the correlation coefficient p, it is possible, but rather 
tedious, to calculate the power function of the g’-test. We will, therefore, restrict 
ourselves to considering only the asymptotic behavior of the power function. 

Consider tests of independence (p = 0) against one-sided alternatives p > 0. 
Let L{ (p) be the power function of the q’-test for the sample size m and L‘(p) 
be the power function of the test based on the correlation coefficient r in a 
sample of size n. We assume that all tests have the same size, i.e. 


(14) Li? (0) = L@(0) = a 
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for all m and n. We shall say that the q’-test has the asymptotic efficiency e if 


(15) 


when 


This means that the sample size in using the r-test need only be 100% of 
that in using the q’-test, in order to get the same derivative of the power functions 
at p = 0 (for large sample sizes). Since the definition of « only concerns the 
behavior in the neighborhood of p = 0, it might perhaps be more correct to call e 
the asymptotic local efficiency. 

In order to calculate « we define two sequences {gm} and {ra} such that 
{q’ > qm} and {r > r,} are tests with the afore mentioned properties. According 
to (9) and (10) q’ is asymptotically normally distributed with mean g and s.d. 


V/ (1 — @)/m. Furthermore, 7 is asymptotically normally distributed with mean 
pand s.d. (1 — p’)/+/n. Hence, 


1 — Li? (o) = P{q' < am | p} )~o| Hot svi |, 


1-— LY (p) = Pir<ra|\ p} ~o| eas val), 


from which it follows 


(=) ~ &'(gm-a/m) - 4) Vm, 


(2) ” : 
(2 ) ~ B'(ra-/n) Vn. 
Op Jo 
According to (14) we have 
lim gm-/m = lim ra: =e (1 — a). 


m—o n—cO 


(16) 


Thus we conclude 


(17) lim eh ns ™ (3 ‘) =. 
..” 
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( 


In other words, the asymptotic efficiency of the q’-test is about 41%. 


Hence, according to (12) and (15) 


8. Concluding remarks. An interesting similarity exists between the q’-test 
of independence and a test of equal location parameters in two distributions, 
constructed in the following way. Suppose that two samples of equal size, say k, 
are drawn independently from two distributions. Compute the number of 
individuals, say r, in the first sample, falling short of the median of the pooled 
samples. Then the distribution of 2r under the null hypothesis is the same as 
that of m; in the q’-test for sample size 2k (or 2k + 1). The test based on r was 
discussed by F. Mosteller [7]. 

Another similarity is between the q’-test and a special case of the exact test of 
independence in a 2 x 2 table [8]. If in such a table the marginals happen to be cut 
at the 50% points the two test procedures become identical. 
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SOME TWO SAMPLE TESTS 


By Douctas G. CHAPMAN! 


University of Washington 














1. Introduction and summary. Stein [4] has exhibited a double sampling pro- 
cedure to test hypotheses concerning the mean of normal variables with power 
independent of the unknown variances. This procedure is here adapted to test 
hypotheses concerning the ratio of means of two normal populations, also with 
power independent of the unknown variances. The use of a two sample procedure 
in a regression problem is also considered. 

Let {X;;} ( = 1, 2) G = 1, 2, 3, ---) be independent random variables 
distributed according to N(m; , o;): all parameters are assumed to be unknown. 

Defining k by the equation 


(1) m = kmz2 


we wish to test the hypothesis H that k has a specified value ky . 

If ko = 1 the hypothesis H reduces to a classical problem, often referred to 
in the literature as the Behrens-Fisher-problem (cf. Scheffé [3] for a bibliography). 
At the present time it is still an open question whether it is possible (or desirable) 
to find a non-trivial single sample test for H with the size of the critical region 
independent of o; and o2. In any case it is a simple extension of the result of 
Dantzig [1] (cf. also Stein [4]) to show that no non-trivial single sample test 
exists whose power is independent of o; and a2. 

On the other hand the case ky) # 1 may be expected to occur frequently in 
fields of application where a choice must be made between different products, 
methods of experimentation etc. which involve different costs. The statistician 
must make a chcice on the basis of results relative to the ratio of costs involved. 
Nevertheless this problem appears to have received little attention in the 
literature. 

In general tests based on a two-sample procedure may not be as “efficient”’ 
in the sense of Wald [5] as a strict sequential procedure. On the other hand the 
two sample procedure reduces the number of decisions to be made by the experi- 
menter and it will, in certain fields, simplify the experimental procedure. 


























2. The two sample procedure. Stein’s double sampling procedure (which may 
be denoted procedure S) to test a hypothesis concerning the mean of a normal 
population consists briefly in the following steps: 

(a) Choose “a priori’ a positive number z and a preliminary sample size n. 
(b) Take n independent observations x, , --- , 2, of the random variable X 












1 This research was carried out while the author was at the University of California. 
Berkeley, and was supported in part by the Office of Naval Research. 
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which is assumed to be distributed according to N(m, o*) with unknown mean m 
. 9 
and unknown variance o’, and calculate 


2 (es — 4). 


2 2 = 
(2) , n—-1 


(c) Let N = max(| “ | +1,n+ 1) where [r] = largest integer < r 


(d) Take N — n more independent observations of X and choose a set of 
constants a, , --- ay such that 


N - : 
(3) (i) 2 a, aed 1, (ii) Q@ = GQ = -:* = ay . (iii) pe a; oa ue ° 
i=] ian 
N 
~ a;%i — m 


(e) Then ="___——_ has Student’s -distribution with n — 1 degrees of 


V2 
freedom. 

Stein further showed that the procedure may be modified to some advantage 
in problems dealing with a single population. This modification is not applicable 
in the problems under consideration here. 

There remains to be discussed briefly the choice of n, z and the a’s. The pre- 
liminary sample size n may be determined by other considerations or it may be 
chosen as part of the design of the experiment. Hodges [2] has shown that the 
expected value of the total sample size N and the power of the test both depend 
on the choice of n and he has discussed the optimum choice of n with respect 
to the modified procedure of Stein. In general this optimum choice of n depends 
upon prior knowledge concerning the variance. 

The power of the test will depend upon z: some considerations concerning 
the choice of z will be dealt with after discussing the tables upon which the 
two sample tests are based. 

The arbitrariness involved in choosing the a’s may be eliminated by placing 
the additional requirement that 


(4) Ont = Ong2 = °° = Gyn =bD (say). 


Letting a; = a. = --- = a, = ait is elementary to solve fora and b explicitly 
viz., 


(5) na + (N — n)b 


na + (N — n)bv° 
The solutions are 
n(Nz — wu?) ) 
(6) =3(2 i 4/ ew) —n)w J’ 
(7) a= 1— (VW — nb 


n 
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3. Test for H. The steps involved in testing the hypothesis H are 
(a) Choose the preliminary sample size n, and positive numbers 2; , 22 subject 
to the restriction 


(8) 





(b) Carry out procedure S with the same n for each population, determining 
two statistics 7, , T2, i.e. 


Ni 

2X Aijrij 
9) ,,0 =. 
( . V 2: 


Then 7; — T2has, under the hypothesis tested, the distribution of the difference 
of two independent Student variables. 

If s denotes the difference of two independent random variables f, and fz 
each distributed according to Student’s ¢-distribution with n — 1 degrees of 
freedom and if so is defined by the equation 


@ = 1, 2). 


P(|s| > 8) = a, 


then a test of size a is given by the rule: H is rejected if | T, -— T2| > %. 





4. The distribution of differences of Student variables. The distribution of s 
is easily found by the method of characteristic functions, in case n is even. 
Let m = n — 1 and to simplify slightly put 
ti 
i= = ’ 2 e 
(10) y a/m (@ 1, 2) 


Then the density function of y; is 


r (= + ) 
2 1 
(11) fW= vzr(3) (1+ ye 


2 










and its characteristic function 


+20 . 
(12) g(t) = [ e™' f(y) dy 





nad (* —1 4 r) 
/ ate m—1)/2 : 
(13) ie ws. : > 2 [2(| eo 


_ m\ 2" | sa—- 3 ; 
5 m! 5 —r})! 


Formula (13) may be obtained by contour integration; it is, however, a standard 
formula in connection with Bessel functions of the second kind of purely imagi- 
nary argument (cf. Watson [6], pp. 80, 185-188). 
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While it is not possible to obtain a simple general expression for 


+00 
(14) fw) = = [ele (oF at, 


the density function of w = Tz this integral may be evaluated for m = 1, 3,5 


etc. and furthermore the density function of s may be integrated in a closed form 
for such values of m, and consequently tabulated fairly easily. 

In case 7 is odd it is possible to express ¢,(#) in terms of Bessel functions but 
the Bessel functions obtained are not expressible in a closed form. While the 
problem may be attacked directly by numerical integration, it will generally be 
sufficient to interpolate in Table I where necessary, for such values of n. 

Table I gives the distribution of s for n = 2, 4, 6, 8, 10, 12. For larger values 
of n it may be sufficiently accurate to use the normal approximation to the 
distribution of s. In virtue of the asymptotic normality of the t-distribution s 
will be distributed approximately normally with mean zero and variance te Seow 


for n sufficiently large. 


5. Power of the test. Writing 


Mm, Me 


(15) + oe ae and T=T7T,-T, 
it is seen that JT = s + A and hence 
(16) P(H is rejected) = P(| T | > %) = P(s < —s — A) + P(s > % — A). 


sR (L-9 


equation (16) may be used as a guide in choosing z; so that a certain minimum 
power is attained; the presence of the nuisance parameter m, makes impossible 
the determination of z. so as to give exactly some preassigned power. 

Since s is distributed independently of o; , a2 , it follows that the power of the 
test is independent of these parameters. Using the addition formula to express 
the frequency function of s in terms of the frequency function of Students’ 
t-distribution, it may be shown that f(s) in unimodal and symmetrical about 

= (0. Hence the test is unbiased. It also follows from (16) that if z. is made to 
approach zero the probability of rejecting H when it is false tends to 1: i.e. 
the test is consistent. 

It may be observed that tests for the one-sided hypotheses 


Since 


=k « =< 
Me m2 


~~ w& \“Y 


-— Vw or 


a 
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may easily be formulated. Table II provides a table useful for such tests also, 
at half the indicated significance levels. 


TABLE I 
Distribution of s: difference of two independent student-variables with n — 1 degrees of freedom 
The value tabled is P(OSsSs3,) 




















he | | | | | | Normal Ap- 
s, ” 2 | 4 6 8 10 | 12 proximation 
al | for » = 12 
0.50 0.0780 | 0.1014 | 0.1222 | 0.1265 | 0.1290 | 0.1306 | 0.1254 
1.00 .1476 .1922 .2311 .2392 .2438 .2467 .2388 
1.50 .2048 .2660 .3185 .3290 .3349 .3386 .3313 
2.00 .2500 .3243 3825 .3939 .4002 4041 .3996 
2.50 2852 |  .3620 4260 .4364 4415 .4465 4451 
3.00 .3128 .3903 4542 4637 .4687 A724 4725 
3.50 .3348 .4104 4726 .4796 4834 .4856 4874 
4.00 .3524 4247 A825 4884 .4914 .4929 4947 
4.50 .3669 .4352 .4890 .4936 .4956 .4966 .4980 
5.00 .3789 4431 .4930 .4964 4977 
5.50 .3890 4491 | .4955 .4980 .4988 
6.00 3976 | .4539 .4970 .4988 
6.50 .4050 .4578 .4980 
7.00 4114 | .4611 | .4986 
7.50 4170 | .4638 | 
8.00 .4220 .4661 
10.00 4372 .4730 
12.00 4474 | A774 
21.00 .4698 .4870 
30.00 4788 |  .4908 | | 
50.00 4873 | | | 
100.00 4936 | | | 
TABLE II 
The 5% and 1% significance points of the distribution of s 
The value tabled is s. 
* " | | | | | Normal 
Ns | g@ to tf el we | we] oe 
Significance Level \ | | n= 12 
scateseadlbacmtaeieammaiccecaici tan clan a i a 
P(ls| 2s) = 05 | 25.41 | 10.82 | 3.62 | 3.34 | 3.18 | 3.10 | 3.06 
P(js|zs.) = .01 | 127.3 | 368 | 5.38 | 4.72 | 4.42 | 4.26 | 4.03 





6. A regression problem. We consider the problem where x; are values of a 
sure variable, Y; are independent random variables with 


(17) E(Y,) =a + be; 


and oy, is unknown. It is desired to estimate a and b and to test the hypothesis 
b = bo ° 
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The usual procedure is to assume oy, constant, and use the Markov theorem 
(i.e. the standard least squares formulae). In this way unbiased estimates of 
a and b are obtained, whether or not this assumption is fulfilled. However the 
usual significance test for b is not valid if this assumption (plus normality of 
the Y’s) is not fulfilled. 

The two sample procedure leads to a valid test of the hypothesis b = bo , with 
power independent of the unknown variance. Since linearity of the expected 
value of Y on z is assumed, the optimum procedure is to observe Y for only two 
values of x, at opposite ends of the range. Let these points be 2; , x2 . For these 
values of x, procedure S may be used (choosing 21= 22) to determine 7; , T; 
where 7; — (a + bz;)/+/z has Student’s t-distribution with n — 1 degrees of 
freedom. 

Then the following estimates of a, b are unbiased, for n > 3, 


(18) i, = (Z— 2) wt 
=m Za 
es mts — Th) . 
~ — ( tz — v2. 


To test the hypothesis H,:b = bo it is necessary only to calculate the statistic 
t= (Ti — Ts) Vz — bola — 22)]/+/z and reject H,, at the a level of sig- 
nificance if | ¢ | > so , where s) was defined above (Section 3). 

It is seen that if b’ is the true value of b, then the power of the test is a function 
of (b’ — bo) (x1 — 22)/+/z and z may be determined to obtain any prescribed power 
desired. It is also immediate that the power of the test is independent of oy, . 

The author wishes to express thanks to the members of the computing staff 
of the Statistical Laboratory, University of California, Mrs. E. Putz, Miss J. 
Linton, and Mr. J. Blum, for assistance in preparing Tables I and II.? 
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2It has been pointed out to the writer that percent points of linear combinations of 
two independent Student t’s are given in Table VI (by P. V. Sukatme) in R. A. FisHErR 
AND F. Yates, Statistical Tables for Biological, Medical and Agricultural Research, Oliver 
and Boyd, Edinburgh, 1943 (added in page proof). 


NOTES 


This section is devoted to brief research and expository articles and other short items. 


TRANSFORMATIONS RELATED TO THE ANGULAR AND 
THE SQUARE ROOT 


By Murray F. FREEMAN AND JOHN W. TuKEy! 
Princeton University 


1. Summary. The use of transformations to stabilize the variance of binomial 
or Poisson data is familiar (Anscombe [1], Bartlett [2, 3], Curtiss [4], Eisenhart 
[5]). The comparison of transformed binomial or Poisson data with percentage 
points of the normal distribution to make approximate significance tests or 
to set approximate confidence intervals is less familiar. Mosteller and Tukey [6] 
have recently made a graphical application of a transformation related to the 
square-root transformation for such purposes, where the use of ‘binomial 
probability paper” avoids all computation. We report here on an empirical study 
of a number of approximations, some intended for significance and confidence 
work and others for variance stabilization. 

For significance testing and the setting of confidence limits, we should like 
to use the normal deviate K exceeded with the same probability as the number of 


successes x from n in a binomial distribution with expectation np, which is 
defined by 


1 


K 
e*” dt = Prob {x < k| binomial, n, p}. 


2r Ln 


The most useful approximations to K that we can propose here are N (very 
simple), N* (accurate near the usual percentage points), and N** (quite accurate 
generally), where 


N =2(VK+ lq -— Vin — Bp). 


(This is the approximation used with binomial probability paper.) 


N+ 2p-1 
7 — , ao = Ss 
N'=N+ 12/E° E = lesser of np and nq, 


. (N — 2)(N + 2) 1 1 ) 
* = querer nnn eS, = 
ae 12 (T5Fi Vng + 1)’ 


N* + 2p — 1 
ee me T* eS = 
N** = N* + Ts E = lesser of np and nq. 


For variance stabilization, the averaged angular transformation 


+ —1 x + —1 z+il1 
sin oe + sin Vint 


1 Prepared in connection with research sponsored by the Office of Naval Research. 
607 
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has variance within +6% of 


821 


(angles in radians), ; (angles in degrees), 
2 


1 
n+ 4 
for almost all cases where np > 1. 
In the Poisson case, this simplifies to using 
VitvVrtl 


as having variance 1. 


2. Significance testing. In addition to the approximations mentioned above, 
empirical study was also made of the following 


x — np 
ee . 
V npg 
L* = L modified by a term like that in N*, 


ia i I 
M 2Vn +i (sin 7 - ve); 


M* = M modified by a term like that in N*. 


Taking an upper limit of 2.5 or 3.5 on | K | and a lower limit of 0.01, 1, or 4 
on np, the greatest observed errors of the approximations were smallest for 
N**, N* and M* and largest for the direct approximations LZ and L*. This 
was true for all six choices of region. 

If we exclude the cases k = 0 and k = n, where the desired probability can be 
calculated directly, the largest observed errors in the substantial number of 
cases computed, which are probably representative of the regions where the 
approximations are worst, were as follows: 


|K| | E=np Nn** 


Largest observed error of 
N* Nt N 


<2.5 >4 ot 16.17 
>1 0 19 «6.20.24 
>0.01 .04 j : .1¢ .20 


>4 08.07) 19 #25 2 63 
as 11.10 38 1.26 
| 3001. 21 «4.65 —ti«C*«*SG‘“ 3.42 


Within the range of great interest, | K | < 2.5, that is .0062 < probability 
< .9938, we have errors of less than 0.04 in N** and less than 0.20 in N. 
For 1.5 < | K | < 2.5, the range of greatest interest, the average error of 


N+ was less than 0.03 and the maximum was 0.08 (54 cases considered). 





TRANSFORMATIONS 


Thus, we can recommend 
N —as a simple and usually accurate transformation, 
N* —for rapid significance testing, 
N**—for adequate accuracy at all levels. 


Figure 1 shows the behavior of the various approximations in the case n = 50, 
np = 5. This is roughly typical. 
40 


ERRORS FOR- 
n=50 


np+5 


“1 K=O “| 
Fig. 1. Errors of approximation. 


3. Variance stabilization. The various suggestions for stabilizing the variance 
of the Poisson are: 


Vx + 1/2, (Bartlett [2]), 
Vx + 3/8, (Anscombe [1]), 
Viet Vz +1, (this paper). 


Figure 2 shows the variance of the transformed variate as a function of the 
Poisson expectation. Clearly ~/z + ~/zx + 1 is the best if small expectations 
are to be considered. The simplicity with which it can be read from a square-root 
table, and its unit variance, are also favorable factors. 

When an approximation of a given form is to work over as large a range as 
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possible without the magnitude of its errors exceeding a certain limit, the opti- 
mum approximation is almost certain to involve errors of both signs. If +6% 
variation in variance is permissable, ~/x + ~/z + 1 is usable for expectations 
of unity or more. It is not surprising that Anscombe’s approximation, obtained 
by eliminating the term in n™, and dominated by the term in n~’, should only 
meet the +6% tolerance for expectations of 2.2 or more. 


VX+ 3/8 —> 


Ratio of actual to limiting variance. 





| 2 3 
Fic. 2. Stabilization of Poisson variance. 


4. Scope. Values of K, and with some occasional exceptions, of L, L*, 
M* , N, N*, N* and N** were calculated for 


n = 2,5, 10, 20, 100, 
p = 1%, 2%, 5%, 10%, 20%, 30%, 40%, 50%, . 
Kk giving K < 4.5, 
and similar computat’ons were made for the Poisson case with expectations 


1/100, 1/40, 1/20, 1/10, 1/5, 1/2, 1, 2, 4, 8, 16, 32, 64. 
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These computations were made to only two decimal places, so that the final 
results may easily err by 1, 2, or 3 in the second decimal place. 

A more complete discussion of the problem, the origin of the approximations, 
and tables showing a representative collection of actual values can be found in 
Memorandum Report 24 of the Statistical Research Group, Princeton Univer- 
sity, which bears the same title as this note. Copies may be obtained from its 
Secretary, Box 708, Princeton, N. J. 
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REMARK ON THE ARTICLE “ON A CLASS OF DISTRIBUTIONS THAT 
APPROACH THE NORMAL DISTRIBUTION FUNCTION” BY 
GEORGE B. DANTZIG' 


By T. N. E. GREVILLE 
Federal Security Agency 


In this interesting and valuable article, Dr. Dantzig showed that, under 
certain conditions, a sequence of frequency distributions connected by a linear 
recurrence formula converges to the normal distribution. Among several applica- 
tions of his results which are discussed, the author mentions their relation to 
certain types of smoothing formulas, and has shown that if a linear smoothing 
formula and the data to which it is applied satisfy certain conditions, the iteration 
of the smoothing process produces a sequence of smoothed distributions which, 
upon normalization, approaches the normal frequency curve. 

In a summary paragraph at the end of the article, it is stated that ‘‘successive 
application of one or many such linear formulas will usually smooth any set of 
values to the normal curve of error.” The entire article was concerned with 
frequency distributions, and a careful reading makes it clear that the author 
intended the quoted statement to apply only to data in this form. However, its 
rather general wording seems to have led a number of readers to interpret it as 
being applicable to other types of data, such as time series, which frequently 
may not satisfy the conditions assumed. Moreover, it is easy to overlook the 


1 Annals of Math. Stat., Vol. 10 (1939), pp. 247-253. 
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restrictions imposed on both the original data and the smoothing formula as 
they are stated only by implication, and not explicitly, even though they have 
the effect of excluding important classes of smoothing formulas, such as those 
commonly employed by actuaries. 

The approach to the normal distribution is shown to depend on the vanishing 
of a certain limit denoted as I’ which is a function of the moments of the original 
data and of a distribution in which the weights employed in the smoothing 
formula are interpreted as frequencies. At this point, objection may be taken 
to Dr. Dantzig’s proof, since the smoothing formulas most frequently used 
contain negative weights. However, it has been shown elsewhere’ that the 
occurrence of negative weights will not of itself prevent the sequence of smoothed 
distributions from approaching the normal curve. A somewhat more serious 
difficulty arises if, as is commonly the case, the smoothing formula has the 
property of reproducing polynomials of a specified degree. If the degree repro- 
duced is two or more, this implies the vanishing of the second moment of the 
weight distribution, in which case the limit I’ does not vanish. In fact, it has 
been shown by DeForest’ and Schoenberg that the iteration of smoothing 
formulas which reproduce polynomials of higher degree gives rise to a sequence 
of limiting distributions which have the general appearance of the normal curve 
in the center portion and of a damped sine curve in the tails. This is, however, at 
best, a technical exception to Dantzig’s statement, as one is still faced with his 
basic proposition that repeated application of a smoothing formula to a frequency 
distribution will cause the smoothed distribution to be dominated by the char- 
acteristics of the smoothing formula rather than those of the original data. 

While he did not intend the statement to refer to data not in the form of a 
frequency distribution, some readers seem to have interpreted it as being of 
general application, and, for that reason, I should like to point out a few of the 
considerations involved in applying iterated smoothing to other types of data, 
such as, for example, a time series or the values of a mathematical function. 
The limit I’, on whose vanishing Dantzig’s theorem depends, involves the 
second and fourth moments of the original data (as well as of the weight dis- 
tribution) and, therefore, can be computed only if these moments exist. For 
this it is necessary (but, of course, not sufficient) that the function being smoothed 
shall tend toward zero as the independent variable approaches positive or 
negative infinity. 

In order to iterate a smoothing formula an infinite number of times, it is 
obviously necessary to have an infinite set of original values. Therefore, in 
smoothing, for example, a finite time series, one would have to make some 
assumption regarding the values of the series outside the range for which they 


27. J. ScHOENBERG, ‘“‘Some analytical aspects of the problem of smoothing,” Courant 
Anniversary Volume, Interscience Publishers, New York, 1948. 

3H. H. WoLrenveEn, ‘‘On the development of formulae for graduation by linear com- 
pounding, with special reference to the work of Erastus L. DeForest,’’ Trans. Actuarial 
Soc. Am., Vol. 26 (1925), pp. 81-121. 
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are actually available. Of course, if it were assumed that the values were zero 
outside this range, Dantzig’s theorem would apply. However, under this assump- 
tion, infinite iteration of a smoothing formula would not be a rational procedure, 
as it would smooth each value to zero, and the incidental fact that the sequence 
of smoothed distributions, while approaching zero, also approach the form of a 
normal distribution, would not be a very valuable one. In this connection, an 
important distinction between time series and frequency data is that, in dealing 
with the former, one is interested in the magnitude of individual values as well 
as in the general form and shape of the distribution. In practice it might be 
preferable not to make any assumption about the values outside the given 
range but rather to employ special devices to obtain smoothed values near the 
ends of this range. In such a case, the smoothing process would be a function 
of the range (if not of the actual values) of the original data distribution. Such a 
process was not considered by Dantzig, and is clearly excluded by his definition 
of a linear smoothing formula, which requires that the formula be completely 
independent of the data to which it is applied. 

The somewhat academic question of the effect of iteration of a smoothing 
formula on a function of infinite range for which the moments do not exist, is a 
difficult one, to which I cannot give a general answer. Schoenberg does not 
consider this problem, but merely gives the weight distribution to be applied 
to the original data in order to obtain the limiting smoothed distribution. Two 
trivial examples may, however, serve to illustrate the nature of the considerations 
involved. If the original data are values of a polynomial of a specified degree, 
and if a smoothing formula which reproduces that degree is successively applied, 
it will of course continue indefinitely to reproduce the original values. On the 
other hand, if the smoothing formula reproduces only polynomials of lower 
degree, a bias is introduced. As a simple example, we may consider the case of 
smoothing the function y = 2° by a formula consisting of three weights each 
equal to 1 3 to be applied to the given value and its two immediate neighbors. 
It is easily shown that the smoothed value is z° + 1/3, and the effect of successive 
application of this formula is to add 1/3 each time. Thus each smoothed value 
would tend toward infinity as the number of smoothings increases; however, 
the entire distribution would always remain a parabola of the same form as 
originally. 

Finally, I should like to emphasize that, in common with Dr. Dantzig, I 
do not regard infinite repetition of the smoothing operation as a practical pro- 
cedure, but consider it preferable to select, in the first instance, a smoothing 
formula which is likely to have the desired effect and then to perform the smooth- 
ing in a single step. In this way, one is more likely to secure the result desired 
without losing sight of important characteristics of the original data. 
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INDEPENDENCE OF QUADRATIC FORMS IN NORMALLY 
CORRELATED VARIABLES! 


By Yuxktryost KAwADA 


Tokyo University of Literature and Science 


The problem to give a necessary and sufficient condition that two quadratic 
forms in normally correlated variables are independent was treated by many 
authors [1], [2], [3], [4], [5]. We shall give here also a solution of this problem, 
which may be a generalization of that given by B. Matérn [6] for nonnegative 
quadratic forms to the general case. 

THEOREM 1. If two quadratic forms 


(1) Q = do ajeizy, Qe = Do diaz; 
ij=1 tj=1 


an normally correlated variables x; , +++ , Xn with zero means and with the variance 
matrix I satisfy the following four conditions 


(2) F;; = E(QiQ:) — EQDE(Q:) = 0 (i,7 = 1, 2), 
then the relation 


(3) AB (A = (a;;), B = (b;;)) 
holds. 


Corouuary 1. If Q:,Q2 in (1) satisfy the four conditions (2), then Q: and Q, 
are independent. 


Coro.Luary 2. (Necessity portion of the theorem of Craig) A necessary 
condition for the independence of Q: and Q. is AB = 0. (The sufficiency was 
proved by Craig.) . 

Proor or THEOREM 1. The proof is very simple. Using the values E(zx;.) = 0, 
(i = 1,3, 5, 7), E(ai) = 1, E(ai) = 3, E(we) = 15, E(ai) = 105 (k = 1, --- , n), 
we have by a straightforward calculation’ the following relations 
(4) Fu = 2Tr(AB), 

(5) Fy = 8Tr(AB’) + 4Tr(AB)Tr(B), 

(6) Fa = 8Tr(A°B) + 4Tr(AB)Tr(A), 

(7) Fe = 32Tr(A*B’) + 16Tr((AB)*) + 16Tr(AB*)Tr(A) + 16Tr(A°B)Tr(B) 
+ 8Tr(AB)Tr(A)Tr(B) + 8Tr(AB)’. 


1 Presented at the Chapel Hill meeting of the Institute of Mathematical Statistics and 
Biometric Society March 18, 1950. 

2 If we apply an orthogonal transformation on (x; , --- , t,) so that A becomes a diagonal 
form, the calculation becomes simpler than with the general form. We may note here also 
the fact that we need not assume that z, , --- , 2, are normally correlated, but we use 
only the values of E(z,) (i = 1, --- , 8) for our proof. 





ERRATA 615 


Put C = AB. Let C’ be the transposed matrix of C. We have from (2), (4)-(7) 


(8) 2Tr(A*B’) + Tr((AB)’) = 2Tr(CC’) + Tr(C’) = 0. 


The left side of (8) is equal to > o3:-1 (c3; + cies; + c3;), which is positive un- 
less all c;; = 0 (i,j = 1, --- , m). Hence we have C = AB = 0, q.e.d. 

Corollary 1 follows from Theorem 1 and the theorem of Craig. Corollary 2 
results from observing that independence of Q, and Q, implies (2). 

B. Matérn proved, that if A, B are nonnegative, then AB = 0 follows from a 
unique condition F;, = 27Tr(AB) = 0. If only one of the matrices A, B is assumed 
to be nonnegative, we have 

THEOREM 2. Let A be nonnegative. Then from two conditions Fy, = 0, Fy 
= 0 in (2) follows the relation AB = 0. 

Proor. From (4), (5) follows Tr(AB’) = 0. Since A is nonnegative, we can 
choose a real symmetric matrix Ap such that A = Aj}. Put Cy = AoB. Then 
we have Tr(AB’) = Tr(C.Co) = 0 and from this follows Cy = 0. Hence we have 
also AB = A,Cy = 0, q.e.d. 
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ERRATA TO “CONTROL CHART FOR LARGEST 
AND SMALLEST VALUES” 


By Joun M. Howe. 
Los Angeles City College 


In the paper cited in the title (Annals of Math. Stat., Vol. 20 (1949), p. 306), 
there are some numerical errors in Table I. Values of d2/2 and d, are given by 
H. J. Godwin in ‘Some Low Moments of Order Statistics’ in the same issue 
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of the Annals. These values are more accurate than those heretofore available. 
A corrected Table I based on these values is as follows: 


n 2 ds A2 A3 As 





. 8256 1.8800 
7480 1.0233 
7012 7286 
.6690 .5768 
6449 .4832 
.6260 4193 
.6107 3725 
.5978 | .3367 
.5868 . 3083 


3.0411 
3.0902 
3.1330 
. 1699 
3.2020 
. 2303 
2556 
2784 
2992 
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ABSTRACTS OF PAPERS 


(Abstracts of papers presented at the Berkeley meeting of the Institute, 
August 5, 1950) 


1. Sampling from Populations with Overlapping Clusters. Z. W. BrrnBavum, 
University of Washington, Seattle. 


In cluster sampling it is usually assumed that the clusters are disjoint. In this paper 
situations are considered in which this assumption is not fulfilled. Let the population z 
consist of N individuals “j”, having the variates V[j], 7 = 1, 2,--- , N, and let K clusters 
C[i],7 = 1,2,--- , K, be such that each “7” belongs to at least one cluster. Let s[j] > 1 
be the number of different clusters to which “j” belongs (the multiplicity of “j’’). The 
cluster C[i] contains N; individuals with the variates V[z, t], ¢ = 1, 2,---, Ni; 
~7=1,2,--- ,K. Ina sampling procedure, let sub-sample sizes n[i] be given for each C[i], 
and weights A[z, ¢] for each V[z, t]; a random sample of k clusters C[iu], uw = 1,2,--- ,k 
is obtained, then n[i.] individuals are sampled from C{i.], and for each of them its vari- 
ate and its multiplicity are recorded. Necessary and sufficient conditions are derived for 


n[i one ‘ ° ° ° 7 1 N ry 
S =>r, prlial V[iu , tv] Alix , to] being an unbiased estimate of V = N 55-10; . The 


variance of S is found, the weights are studied which minimize this variance, and some 
practically important special cases are derived. 


2. A Simple Nonparametric Test of Independence. Nits BLomavist, University 
of Stockholm. 


Consider a sample of size n from a two-dimensional distribution F(z, y). Let Z and 9 
denote the two sample medians and compute the number of individuals, say k, satisfying 
the inequality x < z, y < g (the trivial difficulty arising when 7 is an odd number can 
easily be overcome). A test of independence based on k is nonparametric. As a matter of 
fact one has under the null hypothesis that 


P(k) = (") /(). 
k m 
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where m = |n/2]. In the case of normal F with correlation coefficient p it is possible to 
show, by studying the asymptotic behavior of the power function of the test in the neigh- 
borhood of p = 0, that the asymptotic efficienty of the test is (2/7)*, or about 41%. This 
result is based on the fact that k has an asymptotically normal distribution if some regu- 
larity conditions are fulfilled. In spite of its low efficiency it is suggested that the test be 
used in cases where some information can be neglected in favor of the simplicity of the 
method. 


3. On Minimax Statistical Decision Procedures and Their Admissibility. CoLin 
R. Buytu, University of California, Berkeley. 


The problem considered is that of using a sequence of observations on a random variable 
X to make a decision. Two loss functions W; and W2 , each depending on the distribution 
F of X, the number n of observations taken, and the decision 6 made, are assumed given. 
Minimax problems can be stated for weighted sums of W; and W:2 , or for either one subject 
to an upper bound on the expectation of the other. Under suitable conditions it is shown 
that solutions of the first type of problem provide solutions for all problems of the latter 
types, and that admissibility for a problem of the first type implies admissibility for prob- 
lems of the latter types. Two examples are given: estimation of FE X when X is (1) normal 
with known variance, (2) rectangular with known range. The two loss functions are in 
each case W,; = n and an arbitrary nondecreasing function W2(|5 — @|). Admissible 
minimax estimates are obtained. Extensions to any function W,(n) are indicated; two 
examples are given for the normal case where the sample size must be randomised among 
more than a consecutive pair of integers. 


4. Sufficient Statistics and Unbiased Estimates for “Selected” Distributions. 
Doveuas G. CHAPMAN, University of Washington, Seattle. 


A family of distributions obtained from any given family by fixed selection may be 
called a “selected” family. Tukey’s theorem that such selected families admit the same 
set of sufficient statistics as the parent family is proved for an extended class of distribu- 
tions. Further if the selection does not involve truncation the existence of minimum vari- 
ance unbiased estimates of parameters of the parent family ensures the existence of similar 
estimates for the selected family. Some results are derived for minimum variance unbiased 
estimates for truncated distributions. 


5. The Unattainability of Certain Lower Bounds by Product Densities. R. C. 
Davis, U. 8. Naval Ordnance Testing Station, China Lake. 


Under weak regularity conditions it is shown that for the case in which the sample size 
is a nonrandom variable, certain lower bounds are unattainable. Consider a univariate 
chance variable X, possessing an absolutely continuous distribution function F(z, 6), in 
which @ is the unknown parameter. Under quite general regularity conditions Barankin 
has proved the existence and uniqueness of the locally best unbiased estimate of a func- 
tion g(@) for a specified parameter value @) . The criterion of bestness is the minimization 
of the st® absolute central moment (s > 1) of the estimate about g(@.), and Barankin has 
obtained an expression for the lower bound both in the general case and in particular for 
a case which yields a generalization of the Cramer-Rao inequality valid for any s*® ab- 
solute central moment. It is the latter lower bound with which we are concerned. With 
an additional weak assumption concerning the density function of X, it is shown that if 
¢s(%1 , Z2,°** , Xn) is the locally best unbiased estimate of g(6) (obtained by Barankin) 
for each fixed sample size n and for each s > 1, then there exists no probability distribution 
F(x, 6) except for s = 2 yielding a sequence {g,(z; , t2 , -++ , 2n)}(n = 1, 2, --- , ad inf.) 
in which 2 , 22, --- , 2, are for each n independently and identically distributed chance 
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variables and for which ¢,(2 , 22 , +++ , Zn) attains for each n the special lower bound given 
by Barankin. Obviously in the case s = 2, the lower bound is achieved by an efficient sta- 
tistic if one exists. 


6. A Note on the Power of the Sign Test. T. A. JEEvEs AND RosBert Ricwarps, 
University of California, Berkeley. 


Values obtained by using the normal approximation to the noncentral ¢-distribution 
given by Johnson and Welch were compared with exact values given by Neyman and 
Tokarska. The comparison indicated that efficiencies of the sign test computed from the 
approximation would be consistently higher than the true efficiencies. To avoid this bias 
the sign test was randomized so that levels of significance of a = .05 and a = .01 were 
obtained and the exact values of the noncentral ¢ used. Efficiencies were computed using 
various measures of equivalence of the power functions: (1) balancing the area (Walsh), 
(2) minimizing the maximum difference, (3) equalizing the power at certain fixed points. 
The various measures of equivalence yielded no marked differences in efficiencies. Tables 
were given of the efficiencies for small n. The efficiency for a = .05 was about .7 for n be- 
tween 6 and 20 and somewhat higher for a = .01. The efficiency slowly approaches the 
asymptotic value of 2/7 = .6366 as n increases. 


7. About Some Classes of Sequential Procedures for Obtaining Confidence 
Intervals of Given Length. (Preliminary Report). WERNER R. LEIMBACHER, 
University of California, Berkeley. 


The special class C, of such procedures indicated by A. Wald (Sequential Analysis, John 
Wiley & Sons, 1947, pp. 145-156) can be extended by generalizing and improving the in- 
equality on which the procedures are based. It is shown that even in this larger class C2 , 
a procedure could possibly be optimum only under very special circumstances. The well 
known optimum procedure for a normal distribution N(@, 1) can be obtained as the limit 
of a sequence of procedures from C2. For the suggested sequence, however, the limit no 
longer belongs to C2. In order to eliminate various deficiences of C2, a modified class C3 
is proposed which contains the well known optimum procedures for the normal and rec- 
tangular distributions. The method indicated seems suggestive for the general case of 
estimating location parameters by confidence intervals. 


8. On the Stochastic Independence of Symmetric and Homogeneous Linear 
and Quadratic Statistics. Eugene Luxacs, U. 8. Naval Ordnance Testing 
Station, China Lake. 


It is known that the sampling distributions of the mean and of the variance are stochas- 
tically independent if and only if the parent distribution is normal. This was proven by 
R. C. Geary (Jour. Roy. Stat. Soc., Suppl., Vol. 3 (1936)) and using a different method by 
E. Lukacs (Annals of Math. Stat., Vol. 13 (1942)). The question arises whether there are 
any distributions having the property that the sampling distributions of the mean and of a 
symmetric and homogeneous quadratic statistic are independent. It can be shown that 
there are only the following possibilities: (1) the parent distribution is normal, (2) the 
parent distribution is degenerate with a single saltus of one, (3) the parent distribution is 
a step function with two steps, located symmetrically with respect to zero, (4) the parent 
distribution is a gamma distribution. 


9. The Distribution of the Maximum Deviation between Two Sample Cumula- 
tive Step Functions. Frank J. Massey, Jr., University of Oregon. 


Let 11 < %2 < +++ < 2, and yi: < ys < +++ < Ym be the ordered results of two random 
samples from populations having continuous cumulative distribution functions F(z) and 
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G(x) respectively. Let S,(2) = k/n when k is the number of observed values of X which 

are less than or equal to 2, and similarly let S!(y) = j/m where j is the number of observed 

values of Y which are less than or equal to y. The statistic d = max | S,(z) — S,.(z) | can 
z 


be used to test the hypothesis F(x) = G(x), where the hypothesis would be rejected if the 
observed d is significantly large. In this paper a method of obtaining the exact distribution 
of d for small samples is described, and a short table for equal size samples is included. 
The general technique is that used by the author for the single sample case. There is a 
lower bound to the power of the test against any specified alternative. This lower bound 
approaches one as n and m approach infinity proving that the test is consistent. 


10. An Iterative Construction of the Optimum Sequential Decision Procedure 
with Linear Cost Function. Lincotn E. Mosss, Stanford University. 


Where the cost of taking n observations is proportional to n, define a sequential decision 
procedure Dr by means of its associated “stopping region” 7’; T is the set of a posteriori 
probability distributions £(@) for which Dy instructs the statistician to take no observa- 
tion and to make the decision which minimizes the Bayes risk. Now let Dr be any sequen- 
tial decision procedure which has uniformly bounded average risk for every a priori dis- 
tribution, £(@). Define 7’ as the derived region of T: T’ is the set of £(@) such that the Bayes 
risk of stopping at £(@) is not greater than the risk of taking one observation and then 
using Dr . Define T(*+) = T(™’, Then it is shown that the sequence of regions {7} n = 
1,2, --+ is monotonically decreasing toa limit region 7’, and that D7 is the optimum se- 
quential decision procedure. Some numerical examples are given where the exact solution 
is obtained and the convergence of the iteration is examined. (This paper was prepared 
under the sponsorship of the Office of Naval Research.) 


11. On the Law of the Iterated Logarithm for Dependent Random Variables. 
SranteEy W. Nasu, University of California, Berkeley. 


The order of the remainder term is evaluated in the distribution function of the asymp- 
totically normal sum S, of dependent random variables of a certain class considered by 


Loéve. Bounds are found for the probability that max | S, | = Bir, where B, is the sum 
kin 


of the variances of components of S, . Given an infinite sequence of events A, , a nec- 
essary and sufficient condition is found for the probability that infinitely many A, 
occur to equal one. This criterion extends criteria due to Borel. With these results estab- 
lished, the law of the iterated logarithm is shown to hold for a wide subclass of Loéve’s 
class of dependent random variables. Within this class the partial sum S, — S; may ap- 
proach normality with a speed which depends in a certain functional way on the previous 
sum S; , and which may be arbitrarily slow for some values of S; . The conclusions gener- 
alize earlier results due to W. Doeblin and N. A. Sapogov. 


12. Conditional Expectation and the Efficiency of Estimates. Paut G. Hog, 
University of California, Los Angeles. 


A probability density function, f(z; @), is considered for which the range of z does not 
depend on @ and for which there exists a sufficient statistic for 6. It is shown that under 
certain regularity conditions, there exists a unique unbiased sufficient estimate of @ among 
those sufficient estimates which can be expressed as functions of a particular sufficient 
statistic. This result, together with results of other authors, is used to show that for the 
class of statistics satisfying the regularity conditions, the method of Blackwell for im- 
proving an unbiased estimate of @ does not yield an essentially better estimate than a well 
known estimate. 
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13. Optimum Estimates for Location and Scale Parameters. Raymonp P. 
PETERSON, University of California and National Bureau of Standards, 
Los Angeles. 


Let hi(W | E, 0) = W(0; , 6)p(E | 6), where p(E | @) is the joint probability density 


function of the n (not necessarily independent) sample values 2 , --- , 2, which may be 
represented as a point H = (x, , --- , 2) in the n-dimensional Euclidean sample space M. 
The unknown parameters, 0, , --: , 0,, may be represented as a point 6 = (@:,--- , 0) 


in the s-dimensional Euclidean parameter space 2. W(0; , 0) is a real-valued, nonnega- 
tive, measurable weight function, defined for all Z in M and @ inQ, which represents the rela- 
tive seriousness of taking the estimate 6;(E) as the value of 0; for any particular sample 
point E. Let G(@) be the unknown cumulative distribution function of @. Then 6; (E) is 
defined to be a best estimate of 6; , provided that, if 6;(Z) is any other estimate of 6; in 
the class under consideration, J — [* > 0, where 


t« [I hi(W | E, 6)dE dG(@). 
QSM 
Let 


ri(8) = | h(W|E,0)dE,  ¢s(E) = / hi(W | B, @) do. 
M Q 


A general theorem is proved to the effect that if h:(W | Z, @) is measurable over the product 
space M X Q and if r;(@) and g;(E) are uniformly convergent integrals, then a best estimate 
6; (E) of 6; exists provided that 7;(6@) is constant and that 6; (E) minimizes ¢;(£) for all 
points H in M. General methods are obtained for constructing best estimates for location 
and scale parameters, separately or jointly, and for functions of location and scale param- 
eters from several populations. As special cases, results are derived which are analogous 
to converses of Theorems 1 and 2 in Kallianpur’s, “Minimax Estimates of Location and 
Scale Parameters”, Abstract, (Annals of Math. Stat., Vol. 21 (1950), pp. 310-311). 
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NEWS AND NOTICES 
Readers are invited to submit to the Secretary of the Institute news items of interest. 


Personal Items 


Professor William Feller of Cornell University has been appointed Eugene 
Higgins Professor of Mathematics at Princeton University. 

Dr. Leonard Kent, formerly on the staff at the University of Chicago in the 
School of Business, is now with the firm of Alderson and Sessions, 1905 Walnut 
Street, Philadelphia 3, Pennsylvania. 

Dr. G. B. Oakland has resigned an associate professorship of statistics at the 
University of Manitoba to accept the position as Head of Biometrics Unit, 
Division of Administration, Department of Agriculture, Ottawa. 

Dr. Norman Rudy has accepted an appointment as Assistant Professor at 
Sacramento State College, Sacramento, California. 

Professor G. R. Seth has returned to India to accept the position of Professor 
of Statistics and Deputy Statistical Advisor to the Indian Council of Agricultural 
Research, New Delhi. 
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Mr. Eric Wey], textile engineering consultant, formerly of Manchester, New 
Hampshire, has moved his office to 2509 Vail Avenue, Charlotte, North Carolina. 
Mr. Wey], a specialist in cotton spinning, serves as regular consultant to many 
leading textile mills. 


(a a Rn ra 


The completion and successful operation of SEAC—the National Bureau of 
Standards Eastern Automatic Computer—has been achieved by electronic scien- 
tists of the National Bureau of Standards. SEAC is a high-speed, general-purpose, 
automatically-sequenced electronic computer. It was developed and constructed, 
in a period of 20 months, by the staff of the National Bureau of Standards under 
the sponsorship of the Department of the Air Force to provide a high-speed 
computing service for Air Force Project SCOOP (Scientific Computation of 
Optimum Programs), a pioneering effort in the application of scientific principles 
to the large-scale problems of military management and administration. SEAC 
will also be available for solving important NBS problems of general scientific 
and engineering interest. 
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New Members 
The following persons have been elected to membership in the Institute 


(June 1, 1950 to August 31, 1950) 


Aven, Russell E., M.A. (Univ. of Miss.), Graduate student, University of Mississippi, 
1511 North Main St., Water Valley, Mississtppi. 

Bamberger, Gunter, Dip.-Math. (Univ. Gottingen), Division head in the Statistical Office 
of the City of Cologne, Manderscheider Platz 12, Cologne-Sulz, Germany. 

Bangdiwala, Ishver S., M.S. (Univ. N. C.), Graduate student, University of North Caro- 
lina, 210 A. Phillips Hall, University of North Carolina, Chapel Hill. 

Borch, Karl Henrik, M.Sc. (Oslo Univ.), Field Science Officer for Middle East, UNESCO, 
19 Avenue Kieber, Paris 16e, France. 

Buch, Kai R., M.Sc., Assistant Professor, Technical University of Denmark, Figaardsvej 
14 A?, Charlottenlund, Denmark. 

Carranza, Roque G., Ingeniero Industrial (Univ. Buenos Aires), Consultant Industrial 
Engineer, Parana 56, Buenos Aires, Argentina. 

Dominguez, Alberto G., Ph.D. (Univ. Buenos Aires), Professor of Mathematics, Facultad 
de Ciencias Exactas, Fisicas y Naturales, University of Buenos Aires, Paraguay 1327, 
Buenos Aires, Argentina. 

Dunaway, William L., B.S. (Univ. of Calif.), Graduate student, Dept. of Mathematical 
Statistics, University of California, 4820 Cahuenga Boulevard, North Hollywood, Cali- 
fornia. 

Fernandez, Jose J., Professor, University of Costa Rica, Ap. 1313, San Jose, Costa Rica. 

Fortet, Robert, Ph.D. (Paris), Professor, Department of Science de Caen, 168 Rue Capo- 
niere, Caen (Caloados), France. 

Geppert, Maria-Pia, Ph.D. (Univ. of Giessen), Lecturer, University of Frankfurt; Head 
of Statistical Laboratory, Kerckhoff-Institute, Bad Nauheim; Lecturer, Technical 
High School, Darmstadt, Germany. 

Gortler, J. Henry, Ph.D. (Univ. of Géttingen and Univ. of Giessen), Professor of Applied 
Mathematics and Dean of the Faculty of Natural Sciences and Mathematics, University 
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of Freibrug i. Br.; Manager of ‘‘Gesellschaft fur angewandt Mathematik und Me- 
chanik”; Stadtstrasse 57, Freiburg i. Br., Germany. 

Guilbaud, George T., Agrege de 1 Univ. (Paris), Chief, Section a |’Institute of Science 
Economique Appliquee, Paris, and Professor, Institute of Statistics, University of 
Paris, 35 Boulevard des Capucines, Paris 2, France. 

Holloway, Clark, Jr., M.S. (Univ. of Ill.), Process Research Engineer, Gulf Research and 
Development Co., P.O. 2088, Pittsburgh 30, Pennsylvania. 

Lieberman, Gilbert, M.A. (Columbia Univ.), Mathematician, U.S. Naval Research Labora- 
tory, 220 Newcomb St., S.E., Washington 20, D.C. 

Lomax, K. S., M.A. (Manchester Univ.), Lecturer in Economic Statistics, Economics De- 
partment, The University, Manchester, England. 

Lorenz, Paul, Ph.D., Professor, University of Berlin, Kaiserstuhlstrasse 21, Berlin-Schlach- 
tensee, Germany. 

Lunger, George F., M.B.A. (Univ. of Mich.), Statistician, Great Lakes Investigations, Fish 
and Wildlife Service, Department of the Interior, 2110 Arbor View Blud., Ann Arbor, 
Michigan. 

Maggy, Robert K., M.A. (Univ. of Calif.), Graduate student, University of California, 
1685 Euclid Avenue, Berkeley 9, California. 

McElrath, Gayle W., M.S. (Univ. of Mich.), Assistant Professor, Department of Engineer- 
ing, 208 Main Engineering Building, University of Minnesota, Minneapolis, Minnesota. 

Neisius, W. Vincent, M.S. (Emory Univ.), Mathematics Instructor, Georgia Institute of 
Technology, 597 St. Charles Avenue, N.E., Atlanta &, Georgia. 

Perloff, Robert, M.A. (Ohio State Univ.), Graduate student and Research Assistant, Re- 
search Foundation, Ohio State University, 1281 Bryden Road, Columbus 6, Ohio. 
Peter, Hans, Dr. rer. pol., Professor of Economics, University of Titbingen, T'ubingen- 

Waldhausen 29, Germany. 

Putter, Joseph, M.Sc. (Hebrew Univ., Jerusalem), International House, Berkeley 4, Cali- 
fornia. 

Rankin, Bayard, A.B. (Univ. of Calif.), Graduate student, University of California, Inter- 
national House, Berkeley 4, California. 

Reid, Albert T., B.S. (Iowa State College), Research Assistant in Mathematical Biology, 
Committee on Mathematical Biology, University of Chicago, 5741 Drexel Avenue, 
Chicago 37, Illinois. 

Shaw, Albert, B.S. (Univ. of Alberta), Lecturer, University of Alberta, Department of 
Mathematics, University of Alberta, Edmonton, Alberta, Canada. 

Shuhany, Elizabeth, A.M. (Boston Univ.), Assistant Instructor in Statistics and Assistant 
in Statistical Laboratory of Mathematics, Boston University, 725 Commonwealth 
Avenue, Boston 15, Massachusetts. 

Stewart, John N., B.A. (Univ. of Michigan), Graduate student, University of Michigan, 
4834 Chatsworth, Detroit 24, Michigan. 

Strecker, Heinrich, Doctor der Naturwissenschaften (Univ. Munchen), Mathematical 
Statistician in the Bavarian Statistical Office, Rosenheimerstrasse 130, Munich 8, 
Germany. 

Vaswani, Sundri (Miss) Ph.D. (Univ. of London), Research Associate in Statistics, c/o 
Ahmedabad Textile Industry’s Research Association, P.O. Box 170, Ahmedabad, India. 
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REPORT OF THE BERKELEY MEETING OF THE INSTITUTE 


The forty-fourth meeting of the Institute of Mathematical Statistics was 
held on August 5, 1950, on the Berkeley campus of the University of California, 
in conjunction with the Second Berkeley Symposium on Mathematical Statistics 
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and Probability which met from July 31 through August 12. Other organizations 
cooperating with the Symposium were the Biometrics Section of the American 
Statistical Association, The Western North American Region of the Biometric 
Society, the Econometric Society, the Institute of Transportation and Traffic 
Engineering of the University of California, and the Office of Naval Research. 
Some 218 persons registered for the Symposium, including the following 106 
members of the Institute: 


T. W. Anderson, Fred C. Andrews, Jane F. Andrian, Kenneth J. Arrow, Edward W. 
Barankin, Helen P. Beard, Robert D. Bedwell, Blair M. Bennett, Joseph Berkson, Z: W. 
Birnbaum, David Blackwell, E. Blanco, Nils Blomqvist, Julius R. Blum, Colin R. Blyth, 
A. H. Bowker, George W. Brown, Douglas G. Chapman, C. L. Chiang, K. L. Chung, William 
G. Cochran, Harald Cramér, Edwin L. Crow, J. H. Curtiss, R. C. Davis, W. J. Dixon, J. L. 
Doob, A. Dvoretzky, Mary Elveback, Benjamin Epstein, Mark W. Eudey, Edward A. Fay, 
William Feller, Edgar H. Fickenscher, E. Fix, William R. Gaffey, Robert S. Gardner, 8. G. 
Ghurye, M.A. Girshick, Paul Gutt, Jack C. Gysbers, T. E. Harris, J. L. Hodges, Jr., Wassily 
Hoeffding, Paul G. Hoel, Harold Hotelling, John M. Howell, Harry M. Hughes, R. F. 
Jarrett, T. A. Jeeves, Mark Kac, Joseph Kampé de Fériet, E. 8S. Keeping, Ryoichi Kikuchi, 
Wilfred M. Kincaid, H. 8S. Konijn, Charles H. Kraft, George M. Kuznets, E. L. Lehmann, 
Roy B. Leipnik, Paul Levy, M. Loéve, Arvid T. Lonseth, Eugene Lukacs, C. A. Magwire, 
Jacob Marschak, Thomas Marschak, F. J. Massey, Jr., A. M. Mood, Lincoln E. Moses, 
James T. MeWilliam, Stanley W. Nash, J. Neyman, Howard C. Nielson, Gottfried E. 
Noether, Stefan Peters, John C. Petersen, Raymond P. Peterson, Robert I. Piper, Joseph 
Putter, Robert R. Putz, Bayard Rankin, Fred D. Rigby, David Rubinstein, Elizabeth L. 
Scott, Esther Seiden, Arthur Shapiro, Richard H. Shaw, Ronald W. Shephard, W. B. Simp- , 
son, Monroe Sirken, M. Sobel, Herbert Solomon, A. L. Stewart, Donald E. Stiling, G. 
Szego, Robert Tate, William F. Taylor, Leo J. Tick, A. W. Tucker, Elizabeth Vaughan, 
Shanti A. Vora, Abraham Wald, Allen Wallis, J. Wolfowitz, Miriam L. Yevick. 


Because of the extensive program of more than fifty invited addresses at the 
Symposium, the Institute meeting was devoted only to contributed papers. 
Professor David Blackwell of Howard and Stanford Universities presided at 
the Institute meeting, at which the following program was presented: 


1. Sampling from Populations with Overlapping Clusters. Z. W. Birnbaum, University of 
Washington, Seattle. 

2. A Simple Nonparametric Test of Independence. Nils Blomqvist, University of Stock- 
holm. 

3. On Minimaz Statistical Decision Procedures and their Admissibility. Colin R. Blyth, 
University of California, Berkeley. 

4. Sufficient Statistics and Unbiased Estimates for ‘‘Selected” Distributions. Douglas G. 
Chapman, University of Washington, Seattle. 

5. The Unattainability of Certain Lower Bounds by Product Densities. R. C. Davis, U. 8. 
Naval Ordnance Testing Station, China Lake. 

6. A Note on the Power of the Sign Test. T. A. Jeeves and Robert Richards, University 
of California, Berkeley. 

7. About Some Classes of Sequential Procedures for Obtaining Confidence Intervals of Given 
Length. (Preliminary report). Werner R. Leimbacher, University of California, Berkeley. 

8. On the Stochastic Independence of Symmetric and Homogeneous Linear and Quadratic 


Statistics. Eugene Lukacs, U. 8. Naval Ordnance Testing Station, China Lake. 
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9. The Distribution of the Maximum Deviation between Two Sample Cumulative Step Func- 
tions. Frank J. Massey, Jr., University of Oregon. 

10. An Iterative Construction of the Optimum Sequential Decision Procedure with Linear 
Cost Function. Lincoln E. Moses, Stanford University. 

11. On the Law of the Iterated Logarithm for Dependent Random Variables. Stanley W. 
Nash, University of California, Berkeley. 

12. Conditional Expectation and the Efficiency of Estimates. (By title). Paul G. Hoel, 
University of California, Los Angeles. 

13. Optimum Estimates for Location and Scale Parameters. (By title). Raymond P. Peter- 
son, University of California and National Bureau of Standards, Los Angeles. 


The social activities at the Symposium included a tea on August 1, an excur- 
sion on August 3, a dinner on August 7, a picnic on August 9, and coffee on 
July 31 and August 2, 4, 7, 8, 10, and 11. 


J. L. Hopass, Jr. 
Associate Secretary 
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